Autonomous Motion Intelligent Control Systems Conference Paper 2017

Optimizing Long-term Predictions for Model-based Policy Search

Teaser

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Author(s): Andreas Doerr and Christian Daniel and Duy Nguyen-Tuong and Alonso Marco and Stefan Schaal and Marc Toussaint and Sebastian Trimpe
Book Title: Proceedings of 1st Annual Conference on Robot Learning (CoRL)
Volume: 78
Pages: 227-238
Year: 2017
Month: November
Editors: Sergey Levine and Vincent Vanhoucke and Ken Goldberg
Project(s):
Bibtex Type: Conference Paper (conference)
Event Name: 1st Annual Conference on Robot Learning
Event Place: Mountain View, CA, USA
State: Published
Electronic Archiving: grant_archive
Links:

BibTex

@conference{doerr2017optimizing,
  title = {Optimizing Long-term Predictions for Model-based Policy Search},
  booktitle = {Proceedings of 1st Annual Conference on Robot Learning (CoRL)},
  abstract = {We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment. },
  volume = {78},
  pages = {227-238},
  editors = {Sergey Levine and Vincent Vanhoucke and Ken Goldberg},
  month = nov,
  year = {2017},
  slug = {doerr_corl_2017},
  author = {Doerr, Andreas and Daniel, Christian and Nguyen-Tuong, Duy and Marco, Alonso and Schaal, Stefan and Toussaint, Marc and Trimpe, Sebastian},
  month_numeric = {11}
}