PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Institute Homepage

Institute Homepage DE Sign In

Empirical Inference Conference Paper 2011

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Author(s):	Deisenroth, MP. and Rasmussen, CE.
Book Title:	Proceedings of the 28th International Conference on Machine Learning, ICML 2011
Pages:	465-472
Year:	2011
Day:	0
Editors:	L Getoor and T Scheffer
Publisher:	Omnipress

Bibtex Type:	Conference Paper (inproceedings)

Event Place:	Bellevue, Washington, USA

Digital:	0
Electronic Archiving:	grant_archive

Links:	Web

BibTex

@inproceedings{DeisenrothRT2011,
  title = {PILCO: A Model-Based and Data-Efficient Approach to Policy Search},
  booktitle = {Proceedings of the 28th International Conference on Machine Learning, ICML 2011},
  abstract = {In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. },
  pages = {465-472},
  editors = {L Getoor and T Scheffer},
  publisher = {Omnipress},
  year = {2011},
  slug = {deisenrothrt2011},
  author = {Deisenroth, MP. and Rasmussen, CE.}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives