Zero-Shot Offline Imitation Learning via Optimal Transport

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Learning Empirical Inference Miscellaneous 2024

Empirical Inference, Autonomous Learning

Georg Martius

Senior Research Scientist

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.

Author(s):	Rupf, Thomas and Bagatella, Marco and Gürtler, Nico and Frey, Jonas and Martius, Georg
Year:	2024
Month:	October
Day:	11

Bibtex Type:	Miscellaneous (misc)

Eprint:	arXiv:2410.08751
State:	Submitted
URL:	https://arxiv.org/abs/2410.08751

BibTex

@misc{rupf2024:ZILOT,
  title = {Zero-Shot Offline Imitation Learning via Optimal Transport},
  abstract = {Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.},
  month = oct,
  year = {2024},
  slug = {rupf2024-zilot},
  author = {Rupf, Thomas and Bagatella, Marco and G{\"u}rtler, Nico and Frey, Jonas and Martius, Georg},
  eprint = {arXiv:2410.08751},
  url = {https://arxiv.org/abs/2410.08751},
  month_numeric = {10}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives