Research Overview

The Deep Models and Optimization Group works on improving the efficiency and capabilities of deep learning models. Our approach is rooted in the mathematical theory of optimization. Our interests span language models, vision systems, as well as biological data. The group was established in October 2023 and is a joint research group at the Max-Planck Institute for Intelligent Systems and the ELLIS Institute Tübingen, with research group leader Antonio Orvieto.
The purpose of our research group, established in late 2023, is to design new optimizers and neural networks to accelerate deep learning technology and AI-powered science. Our approach is theoretical, with a strong focus on mathematical optimization as a tool for understanding the challenging dynamics of modern neural networks. With a more robust theoretical foundation, we envision a future where scientists and engineers, regardless of their resource limitations, can leverage powerful and reliable deep-learning tools to help make the world a better place.
Our primary focus is advancing deep learning methods for processing data with long-range interactions. Notable applications include language modeling, genome sequence analysis, and music generation. Dr. Antonio Orvieto, who leads our group, introduced in 2023 the LRU model [], a novel and efficient long-range reasoning block that now powers a fast-inference architecture in Google’s Gemma family. Our innovations extended in 2024 to new technologies designed to accelerate sequential processing and mathematical tools for analyzing their capabilities [
]. Beyond these advancements, we explore applications in 3D vision [
] and graph data [
], with a particular emphasis on the biological domain (understanding of non-coding DNA).
In addition to architectural innovation, we focus on enhancing state-of-the-art neural network training strategies. Our NGN optimizer [] has demonstrated promising performance in vision and text tasks, matching Adam at billion-parameter scales. This work builds on our improved understanding of neural network landscapes [
], and robust hyperparameter tuning strategies for growing network and data scales [
]. Our efforts in understanding efficient training also earned us the 4th place in Algoperf (with SEAL group), the premier deep learning training speed competition hosted by Google and Meta. Our submission was the top non-industry entry, achieved with far fewer computational resources than the leading tech giants.