Model-Based Reinforcement Learning with Continuous States and Actions

Institute Homepage

Institute Homepage DE Sign In

Back

Empirical Inference Conference Paper 2008

Empirical Inference

Jan Peters

Research Group Leader

Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.

Author(s):	Deisenroth, MP. and Rasmussen, CE. and Peters, J.
Book Title:	ESANN 2008
Journal:	Advances in Computational Intelligence and Learning: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)
Pages:	19-24
Year:	2008
Month:	April
Day:	0
Editors:	Verleysen, M.
Publisher:	d-side

Bibtex Type:	Conference Paper (inproceedings)

Address:	Evere, Belgium
Event Name:	European Symposium on Artificial Neural Networks
Event Place:	Bruges, Belgium

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF Web

BibTex

@inproceedings{4977,
  title = {Model-Based Reinforcement Learning with Continuous States and Actions},
  journal = {Advances in Computational Intelligence and Learning: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)},
  booktitle = {ESANN 2008},
  abstract = {Finding an optimal policy in a reinforcement learning (RL) framework with
  continuous state and action spaces is challenging. Approximate solutions
  are often inevitable. GPDP is an approximate dynamic programming algorithm
  based on Gaussian process (GP) models for the value functions. In
  this paper, we extend GPDP to the case of unknown transition dynamics.
  After building a GP model for the transition dynamics, we apply GPDP
  to this model and determine a continuous-valued policy in the entire state
  space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.},
  pages = {19-24},
  editors = {Verleysen, M. },
  publisher = {d-side},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Evere, Belgium},
  month = apr,
  year = {2008},
  slug = {4977},
  author = {Deisenroth, MP. and Rasmussen, CE. and Peters, J.},
  month_numeric = {4}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives