Statistical Learning Theory

Institute Homepage

Institute Homepage DE Sign In

Empirical Inference Members Publications

The goal of learning theory is to analyze statistical and computational properties of learning algorithms and to provide guarantees on their performance. The department has made various contributions in this area by providing analyses of algorithms in three important areas: 1) sample-efficient learning, 2) non-parametric distribution comparison, 3) generative modeling.

On sample-efficient learning, we provide a formal analysis of compressing a data sample so as to encode a set of functions consistent with the data []. In [] we provide a novel analysis for a life-long learning setup. Unlike previous studies, our work more explicitly identifies conditions of task relatedness that enable sample-efficient learning. In [] we show that active learning can provide label savings in non-parametric learning settings, in contrast to most previous works that address parametric learning.

Non-parametric distribution comparison Our focus in this area is on estimating the kernel mean embedding (KME) of distributions and its applications. Inspired by the classic James-Stein estimator, we introduced a kernel mean shrinkage estimator (KMSE) and proved that it can converge faster than the plug-in KME estimator []. Related to this, in [], we study the optimality of KME estimators in the minimax sense, and show that the rate $O(n^{-1/2})$ can be achieved by the plug-in estimator of KME, KMSE, and other known estimators. We also study the minimax optimal estimation of the maximum mean discrepancy (MMD), defined as the RKHS distance between KMEs: $\mathrm{MMD}(P,Q):=\|\mu_P - \mu_Q\|$ [].

The properties of MMD are known to depend on the underlying kernel and have been linked to three fundamental concepts: universal, characteristic, and strictly positive definite kernels. In [] we show that these concepts are essentially equivalent and give the first complete characterization of those kernels whose associated MMD metrizes the weak convergence of probability measures. We further derive necessary and sufficient conditions for MMD to metrize tight convergence to a fixed target distribution [].

Building on these analyses, we propose a three-sample test for comparing relative fit of two models []. This generalizes standard nonparametric two-sample testing. In [], we further extended our results to derive a nonparametric goodness-of-fit test for conditional density models, one of few tests of its kind.

Generative modeling We have proposed a number of theoretically-grounded generative models based on generative adversarial networks (GANs) and variational autoencoders (VAEs). In [] we study the training of mixtures of generative models from a theoretical perspective. We find a globally optimal closed form solution for performing greedy updates while approximating an unknown distribution with mixtures in any given f-divergence. While training objectives in VAEs and GANs are based on f-divergences, it has been argued that other divergences, in particular, optimal transport distances, may be better suited to the needs of generative modeling. In [], starting from Kantorovich’s primal formulation of the optimal transport problem, we show that it can be equivalently written in terms of probabilistic encoders, which are constrained to match the latent posterior and prior distributions. We then apply this result to train latent variable generative models []. When relaxed, the constrained optimization problem leads to a new regularized autoencoder algorithm which we call Wasserstein auto-encoders (WAEs). In [] and [] we focus on properties of the latent representations learned by WAEs and show that there are fundamental problems when training WAEs with deterministic encoders when the intrinsic dimensionality of the data is different from the latent space dimensionality. In [], we propose a new generative procedure based on kernel mean matching to generate images, given a seed image set. This allows us to turn an unconditional GAN into a conditional generative procedure without the need to retrain.

Members

Empirical Inference

Ilya Tolstikhin

Research Scientist

Empirical Inference

Ruth Urner

Empirical Inference

Carl Johann Simon-Gabriel

Postdoctoral Researcher

Empirical Inference

Matej Balog

Doctoral Researcher

Research Group Leader

Empirical Inference

Paul Rubenstein

Doctoral Researcher

Empirical Inference

Bernhard Schölkopf

Director

Empirical Inference

Wittawat Jitkrittum

Publications

Empirical Inference Article Metrizing Weak Convergence with Maximum Mean Discrepancies Simon-Gabriel, C., Barp, A., Schölkopf, B., Mackey, L. Journal of Machine Learning Research, 24(184), 2023 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Testing Goodness of Fit of Conditional Density Models with Kernels Jitkrittum, W., Kanagawa, H., Schölkopf, B. Proceedings of the 36th International Conference on Uncertainty in Artificial Intelligence (UAI), 124:221-230, Proceedings of Machine Learning Research, (Editors: Jonas Peters and David Sontag), PMLR, August 2020 (Published) URL BibTeX

Empirical Inference Conference Paper Kernel Mean Matching for Content Addressability of GANs Jitkrittum*, W., Sangkloy*, P., Gondal, M. W., Raj, A., Hays, J., Schölkopf, B. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:3140-3151, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019, *equal contribution (Published) PDF URL BibTeX

Empirical Inference Conference Paper Informative Features for Model Comparison Jitkrittum, W., Kanagawa, H., Sangkloy, P., Hays, J., Schölkopf, B., Gretton, A. Advances in Neural Information Processing Systems 31 (NeurIPS 2018), :816-827, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 32nd Annual Conference on Neural Information Processing Systems, December 2018 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Disentangled Representations with Wasserstein Auto-Encoders Rubenstein, P. K., Schölkopf, B., Tolstikhin, I. Workshop at the 6th International Conference on Learning Representations (ICLR), May 2018 (Published) URL BibTeX

Empirical Inference Conference Paper Wasserstein Auto-Encoders Tolstikhin, I., Bousquet, O., Gelly, S., Schölkopf, B. 6th International Conference on Learning Representations (ICLR), May 2018 (Published) URL BibTeX

Empirical Inference Conference Paper Wasserstein Auto-Encoders: Latent Dimensionality and Random Encoders Rubenstein, P. K., Schölkopf, B., Tolstikhin, I. Workshop at the 6th International Conference on Learning Representations (ICLR), May 2018 (Published) URL BibTeX

Empirical Inference Article Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions Simon-Gabriel, C. J., Schölkopf, B. Journal of Machine Learning Research, 19(44):1-29, 2018 (Published) URL arXiv_long version BibTeX

Empirical Inference Conference Paper AdaGAN: Boosting Generative Models Tolstikhin, I., Gelly, S., Bousquet, O., Simon-Gabriel, C. J., Schölkopf, B. Advances in Neural Information Processing Systems 30 (NIPS 2017), :5424-5433, (Editors: Guyon I. and Luxburg U.v. and Bengio S. and Wallach H. and Fergus R. and Vishwanathan S. and Garnett R.), Curran Associates, Inc., 31st Annual Conference on Neural Information Processing Systems, December 2017 (Published) arXiv URL BibTeX

Empirical Inference Miscellaneous From Optimal Transport to Generative Modeling: the VEGAN cookbook Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C. J., Schölkopf, B. 2017 (Published) arXiv follow-up paper BibTeX

Empirical Inference Article Minimax Estimation of Kernel Mean Embeddings Tolstikhin, I., Sriperumbudur, B., Muandet, K. Journal of Machine Learning Research, 18(86):1-47, 2017 (Published) URL BibTeX

Empirical Inference Conference Paper Active Nearest-Neighbor Learning in Metric Spaces Kontorovich, A., Sabato, S., Urner, R. Advances in Neural Information Processing Systems 29 (NIPS 2016), :856-864, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016 (Published) URL BibTeX

Empirical Inference Conference Paper Consistent Kernel Mean Estimation for Functions of Random Variables Simon-Gabriel*, C. J., Ścibior*, A., Tolstikhin, I., Schölkopf, B. Advances in Neural Information Processing Systems 29 (NIPS 2016), :1732-1740, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016, *joint first authors (Published) URL BibTeX

Empirical Inference Conference Paper Lifelong Learning with Weighted Majority Votes Pentina, A., Urner, R. Advances in Neural Information Processing Systems 29 (NIPS 2016), :3612-3620, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016 (Published) URL BibTeX

Empirical Inference Conference Paper Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels Tolstikhin, I., Sriperumbudur, B. K., Schölkopf, B. Advances in Neural Information Processing Systems 29 (NIPS 2016), :1930-1938, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016 (Published) URL BibTeX

Empirical Inference Conference Paper On Version Space Compression Ben-David, S., Urner, R. Algorithmic Learning Theory - 27th International Conference (ALT), 9925:50-64, Lecture Notes in Computer Science, (Editors: Ortner, R., Simon, H. U., and Zilles, S.), September 2016 (Published) DOI BibTeX

Empirical Inference Article Kernel Mean Shrinkage Estimators Muandet, K., Sriperumbudur, B., Fukumizu, K., Gretton, A., Schölkopf, B. Journal of Machine Learning Research, 17(48):1-41, 2016 (Published) URL BibTeX

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives