Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Learning Conference Paper 2021

Robust Machine Learning

Sebastian Blaes

Postdoctoral Researcher

Empirical Inference, Autonomous Learning

Georg Martius

Senior Research Scientist

Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping

Author(s):	Cristina Pinneri* and Shambhuraj Sawant* and Sebastian Blaes and Georg Martius
Book Title:	The Ninth International Conference on Learning Representations (ICLR)
Year:	2021
Month:	May

Project(s):	Model-based Reinforcement Learning and Planning
Bibtex Type:	Conference Paper (inproceedings)

Event Name:	9th International Conference on Learning Representations (ICLR 2021)
State:	Published
URL:	https://openreview.net/forum?id=Nc3TJqbcl3

Article Number:	1844
Electronic Archiving:	grant_archive
Note:	*equal contribution

Links:	OpenReview

BibTex

@inproceedings{pinneri2021:strong-policies,
  title = {Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers},
  booktitle = {The Ninth International Conference on Learning Representations  (ICLR)},
  abstract = {Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping},
  month = may,
  year = {2021},
  note = {*equal contribution},
  slug = {pinneri2021-strong-policies},
  author = {Pinneri*, Cristina and Sawant*, Shambhuraj and Blaes, Sebastian and Martius, Georg},
  url = {https://openreview.net/forum?id=Nc3TJqbcl3},
  month_numeric = {5}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives