Back
Research Overview
Other
Statistical Learning Theory
The goal of learning theory is to analyze statistical and computational properties of learning algorithms and to provide guarantees on their performance. The department has made various contributions in this area by providing analyses of algorithms in three important areas: 1) sample-efficient learning, 2) non-parametric distribution comparison, 3) generative modeling.
On sample-efficient learning, we provide a formal analysis of compressing a data sample so as to encode a set of functions consistent with the data []. In [] we provide a novel analysis for a life-long learning setup. Unlike previous studies, our work more explicitly identifies conditions of task relatedness that enable sample-efficient learning. In [] we show that active learning can provide label savings in non-parametric learning settings, in contrast to most previous works that address parametric learning.
Non-parametric distribution comparison Our focus in this area is on estimating the kernel mean embedding (KME) of distributions and its applications. Inspired by the classic James-Stein estimator, we introduced a kernel mean shrinkage estimator (KMSE) and proved that it can converge faster than the plug-in KME estimator []. Related to this, in [], we study the optimality of KME estimators in the minimax sense, and show that the rate $O(n^{-1/2})$ can be achieved by the plug-in estimator of KME, KMSE, and other known estimators. We also study the minimax optimal estimation of the maximum mean discrepancy (MMD), defined as the RKHS distance between KMEs: $\mathrm{MMD}(P,Q):=\|\mu_P - \mu_Q\|$ [].
The properties of MMD are known to depend on the underlying kernel and have been linked to three fundamental concepts: universal, characteristic, and strictly positive definite kernels. In [] we show that these concepts are essentially equivalent and give the first complete characterization of those kernels whose associated MMD metrizes the weak convergence of probability measures. We further derive necessary and sufficient conditions for MMD to metrize tight convergence to a fixed target distribution [].
Building on these analyses, we propose a three-sample test for comparing relative fit of two models []. This generalizes standard nonparametric two-sample testing. In [], we further extended our results to derive a nonparametric goodness-of-fit test for conditional density models, one of few tests of its kind.
Generative modeling We have proposed a number of theoretically-grounded generative models based on generative adversarial networks (GANs) and variational autoencoders (VAEs). In [] we study the training of mixtures of generative models from a theoretical perspective. We find a globally optimal closed form solution for performing greedy updates while approximating an unknown distribution with mixtures in any given f-divergence. While training objectives in VAEs and GANs are based on f-divergences, it has been argued that other divergences, in particular, optimal transport distances, may be better suited to the needs of generative modeling. In [], starting from Kantorovich’s primal formulation of the optimal transport problem, we show that it can be equivalently written in terms of probabilistic encoders, which are constrained to match the latent posterior and prior distributions. We then apply this result to train latent variable generative models []. When relaxed, the constrained optimization problem leads to a new regularized autoencoder algorithm which we call Wasserstein auto-encoders (WAEs). In [] and [] we focus on properties of the latent representations learned by WAEs and show that there are fundamental problems when training WAEs with deterministic encoders when the intrinsic dimensionality of the data is different from the latent space dimensionality. In [], we propose a new generative procedure based on kernel mean matching to generate images, given a seed image set. This allows us to turn an unconditional GAN into a conditional generative procedure without the need to retrain.
Members
Publications