EXPLORING BY EXPLOITING BAD MODELS IN MODEL-BASED REINFORCEMENT LEARNING

Institute Homepage

Institute Homepage DE Sign In

Back

Movement Generation and Control Conference Paper 2020

Movement Generation and Control

Sarah Bechtle

Ph.D. Student

Movement Generation and Control

Ludovic Righetti

Visiting Researcher

Exploration for reinforcement learning (RL) is well-studied for model-free methods but a relatively unexplored topic for model-based methods. In this work, we investigate several exploration techniques injected into the two stages of model-based RL:(1) during optimization: adding transition-space and action-space noise when optimizing a policy using learned dynamics, and (2) after optimization: injecting action-space noise when executing an optimized policy on the real environment. When given a good deterministic dynamics model, like the ground-truth simulation, exploration can significantly improve performance. However, using randomly initialized neural networks to model environment dynamics can _implicitly_ induce exploration in model-based RL, reducing the need for explicit exploratory techniques. Surprisingly, we show that in the case of a local optimizer, using a learned model with this implicit exploration can actually _outperform_ using the ground-truth model without exploration, while adding exploration to the ground-truth model reduces the performance gap. However, the learned models are highly local, in that they perform well _only_ for the task for which it is optimized, and fail to generalize to new targets.

Author(s):	Yixin Lin and Sarah Bechtle and Ludovic Righetti and Akshara Rai and Franziska Meier
Book Title:	International Conference on Learning Representations
Year:	2020

Bibtex Type:	Conference Paper (conference)

Event Place:	Addis Ababa, Ethiopia
State:	Published

Digital:	True
Electronic Archiving:	grant_archive

BibTex

@conference{Yixin2020EXPLORING,
  title = {EXPLORING BY EXPLOITING BAD MODELS IN MODEL-BASED REINFORCEMENT LEARNING},
  booktitle = {International Conference on Learning Representations},
  abstract = {Exploration for reinforcement learning (RL) is well-studied for model-free methods but a relatively unexplored topic for model-based methods. In this work, we investigate several exploration techniques injected into the two stages of model-based RL:(1) during optimization: adding transition-space and action-space noise when optimizing a policy using learned dynamics, and (2) after optimization: injecting action-space noise when executing an optimized policy on the real environment. When given a good deterministic dynamics model, like the ground-truth simulation, exploration can significantly improve performance. However, using randomly initialized neural networks to model environment dynamics can _implicitly_ induce exploration in model-based RL, reducing the need for explicit exploratory techniques. Surprisingly, we show that in the case of a local optimizer, using a learned model with this implicit exploration can actually _outperform_ using the ground-truth model without exploration, while adding exploration to the ground-truth model reduces the performance gap. However, the learned models are highly local, in that they perform well _only_ for the task for which it is optimized, and fail to generalize to new targets.},
  year = {2020},
  slug = {yixin2020exploring},
  author = {Lin, Yixin and Bechtle, Sarah and Righetti, Ludovic and Rai, Akshara and Meier, Franziska}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives