Back
Recently, deep generative modeling has become the most prominent paradigm for learning powerful representations of our (visual) world and for generating novel samples thereof. At the same time, most of the progress came from sizing up models - to the point where the development seemed to be restricted to few big tech companies with boundless resources and with implications on future (academic) research, industry, and society.
This talk will contrast the most commonly used generative models to date with a particular focus on denoising diffusion probabilistic models. Despite their enormous potential, these models come with their own specific limitations. We will then discuss a solution, latent diffusion models a.k.a. "Stable Diffusion", that significantly improves the efficiency of diffusion models. Now billions of training samples can be summarized in compact representations that render high-quality synthesis feasible on consumer hardware. We will then discuss recent extensions that cast an interesting perspective on future generative modelling: Rather than having powerful likelihood models memorize local image details, we focus their representational power on scene composition. Time permitting, the talk will also cover approaches to video synthesis and post-hoc interpretation of the learned neural representations.
Prof. Dr. Björn Ommer (University of Munich)
Björn Ommer is a full professor at University of Munich where he is heading the Computer Vision and Learning Group. Before he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of the IWR and the HCI. He received his diploma in computer science from University of Bonn and his PhD from ETH Zurich. Thereafter, he was a postdoc in the vision group of Jitendra Malik at UC Berkeley. Björn serves as an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within the digital humanities and the life sciences. His group has published a series of generative approaches, including "VQGAN" and "Stable Diffusion", which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.