Autonomous Learning Conference Paper 2021

Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers

Featured img

Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping

Author(s): Cristina Pinneri* and Shambhuraj Sawant* and Sebastian Blaes and Georg Martius
Book Title: The Ninth International Conference on Learning Representations (ICLR)
Year: 2021
Month: May
Project(s):
Bibtex Type: Conference Paper (inproceedings)
Event Name: 9th International Conference on Learning Representations (ICLR 2021)
State: Published
URL: https://openreview.net/forum?id=Nc3TJqbcl3
Article Number: 1844
Electronic Archiving: grant_archive
Note: *equal contribution
Links:

BibTex

@inproceedings{pinneri2021:strong-policies,
  title = {Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers},
  booktitle = {The Ninth International Conference on Learning Representations  (ICLR)},
  abstract = {Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping},
  month = may,
  year = {2021},
  note = {*equal contribution},
  slug = {pinneri2021-strong-policies},
  author = {Pinneri*, Cristina and Sawant*, Shambhuraj and Blaes, Sebastian and Martius, Georg},
  url = {https://openreview.net/forum?id=Nc3TJqbcl3},
  month_numeric = {5}
}