Why This is (Not) the End of Research in Generative AI: Stable Diffusion & the Revolution in Visual Synthesis

ORGANIZERS

Perceiving Systems

Michael Black

Director

Recently, deep generative modeling has become the most prominent paradigm for learning powerful representations of our (visual) world and for generating novel samples thereof. At the same time, most of the progress came from sizing up models - to the point where the development seemed to be restricted to few big tech companies with boundless resources and with implications on future (academic) research, industry, and society.

This talk will contrast the most commonly used generative models to date with a particular focus on denoising diffusion probabilistic models. Despite their enormous potential, these models come with their own specific limitations. We will then discuss a solution, latent diffusion models a.k.a. "Stable Diffusion", that significantly improves the efficiency of diffusion models. Now billions of training samples can be summarized in compact representations that render high-quality synthesis feasible on consumer hardware.
We will then discuss recent extensions that cast an interesting perspective on future generative modelling: Rather than having powerful likelihood models memorize local image details, we focus their representational power on scene composition. Time permitting, the talk will also cover approaches to video synthesis and post-hoc interpretation of the learned neural representations.

Speaker Biography

Prof. Dr. Björn Ommer (University of Munich)

Björn Ommer is a full professor at University of Munich where he is heading the Computer Vision and Learning Group. Before he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of the IWR and the HCI. He received his diploma in computer science from University of Bonn and his PhD from ETH Zurich. Thereafter, he was a postdoc in the vision group of Jitendra Malik at UC Berkeley. Björn serves as an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within the digital humanities and the life sciences. His group has published a series of generative approaches, including "VQGAN" and "Stable Diffusion", which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives