Movement Generation and Control Conference Paper 2020

EXPLORING BY EXPLOITING BAD MODELS IN MODEL-BASED REINFORCEMENT LEARNING

Exploration for reinforcement learning (RL) is well-studied for model-free methods but a relatively unexplored topic for model-based methods. In this work, we investigate several exploration techniques injected into the two stages of model-based RL:(1) during optimization: adding transition-space and action-space noise when optimizing a policy using learned dynamics, and (2) after optimization: injecting action-space noise when executing an optimized policy on the real environment. When given a good deterministic dynamics model, like the ground-truth simulation, exploration can significantly improve performance. However, using randomly initialized neural networks to model environment dynamics can _implicitly_ induce exploration in model-based RL, reducing the need for explicit exploratory techniques. Surprisingly, we show that in the case of a local optimizer, using a learned model with this implicit exploration can actually _outperform_ using the ground-truth model without exploration, while adding exploration to the ground-truth model reduces the performance gap. However, the learned models are highly local, in that they perform well _only_ for the task for which it is optimized, and fail to generalize to new targets.

Author(s): Yixin Lin and Sarah Bechtle and Ludovic Righetti and Akshara Rai and Franziska Meier
Book Title: International Conference on Learning Representations
Year: 2020
Bibtex Type: Conference Paper (conference)
Event Place: Addis Ababa, Ethiopia
State: Published
Digital: True
Electronic Archiving: grant_archive

BibTex

@conference{Yixin2020EXPLORING,
  title = {EXPLORING BY EXPLOITING BAD MODELS IN MODEL-BASED REINFORCEMENT LEARNING},
  booktitle = {International Conference on Learning Representations},
  abstract = {Exploration for reinforcement learning (RL) is well-studied for model-free methods but a relatively unexplored topic for model-based methods. In this work, we investigate several exploration techniques injected into the two stages of model-based RL:(1) during optimization: adding transition-space and action-space noise when optimizing a policy using learned dynamics, and (2) after optimization: injecting action-space noise when executing an optimized policy on the real environment. When given a good deterministic dynamics model, like the ground-truth simulation, exploration can significantly improve performance. However, using randomly initialized neural networks to model environment dynamics can _implicitly_ induce exploration in model-based RL, reducing the need for explicit exploratory techniques. Surprisingly, we show that in the case of a local optimizer, using a learned model with this implicit exploration can actually _outperform_ using the ground-truth model without exploration, while adding exploration to the ground-truth model reduces the performance gap. However, the learned models are highly local, in that they perform well _only_ for the task for which it is optimized, and fail to generalize to new targets.},
  year = {2020},
  slug = {yixin2020exploring},
  author = {Lin, Yixin and Bechtle, Sarah and Righetti, Ludovic and Rai, Akshara and Meier, Franziska}
}