New Optimizers for Foundation Models

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

New Architectures for Long-Range Reasoning

New Optimizers for Foundation Models

Theory of Deep Learning

Deep Models and Optimization

New Optimizers for Foundation Models

Sab opt — Our new NGN optimizer [] has state-of-the-art convergence guarantees and improved performance in LLM pertaining, when compared to a strong AdamW baseline

We aim to advance the training speed of large language models through novel optimizers. Supported by theoretical insights into neural landscapes, our work bridges foundational research with practical advancements, addressing key challenges in training efficiency and scalability for foundation models in several application domains.

Our NGN optimizer[], rooted in a novel expansion technique that guarantees strong theoretical guarantees, showed promise in vision and text applications at scale, surpassing strong Adam baselines in networks at the scale of billions of parameters. Beyond practical algorithmic advancements such as NGN, the group engages in foundational research, exploring the geometric properties of neural network loss functions [], the complexities of nested min-max problems [], and dynamics in convex settings, including acceleration [] and limit cycle phenomena [].

We continuously seek to develop and evaluate new optimization algorithms tailored for foundation models, such as large language models, to improve training efficiency, memory usage, and performance. Current solutions, including the widely used Adam optimizer, face challenges like instability and sensitivity to hyperparameters. By advancing our theoretical understanding of adaptive methods and rigorously testing these innovations at scale, the project aims to address critical bottlenecks in training large models. Outcomes include publicly available code, detailed findings reports, and benchmarks that provide valuable action items for academic and industry practitioners.