Gaussian Process Dynamic Programming

Institute Homepage

Institute Homepage EN Sign In

Back

Empirische Inferenz Article 2009

Empirische Inferenz

Jan Peters

Research Group Leader

Reinforcement learning (RL) and optimal control of systems with contin- uous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value-function based RL algorithm. We consider both a classic optimal control problem, where problem-specific prior knowl- edge is available, and a classic RL problem, where only very general priors can be used. For the classic optimal control problem, GPDP models the unknown value functions with Gaussian processes and generalizes dynamic programming to continuous-valued states and actions. For the RL problem, GPDP starts from a given initial state and explores the state space using Bayesian active learning. To design a fast learner, available data has to be used efficiently. Hence, we propose to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly. In both cases, we successfully apply the resulting continuous-valued controllers to the under-actuated pendulum swing up and analyze the performances of the suggested algorithms. It turns out that GPDP uses data very efficiently and can be applied to problems, where classic dynamic programming would be cumbersome.

Author(s):	Deisenroth, MP. and Rasmussen, CE. and Peters, J.
Journal:	Neurocomputing
Volume:	72
Number (issue):	7-9
Pages:	1508-1524
Year:	2009
Month:	March
Day:	0

Bibtex Type:	Article (article)

DOI:	10.1016/j.neucom.2008.12.019

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF PDF

BibTex

@article{5531,
  title = {Gaussian Process Dynamic Programming},
  journal = {Neurocomputing},
  abstract = {Reinforcement learning (RL) and optimal control of systems with contin-
  uous states and actions require approximation techniques in most interesting
  cases. In this article, we introduce Gaussian process dynamic programming
  (GPDP), an approximate value-function based RL algorithm. We consider
  both a classic optimal control problem, where problem-specific prior knowl-
  edge is available, and a classic RL problem, where only very general priors
  can be used. For the classic optimal control problem, GPDP models the
  unknown value functions with Gaussian processes and generalizes dynamic
  programming to continuous-valued states and actions. For the RL problem,
  GPDP starts from a given initial state and explores the state space using
  Bayesian active learning. To design a fast learner, available data has to be
  used efficiently. Hence, we propose to learn probabilistic models of the a
  priori unknown transition dynamics and the value functions on the fly. In
  both cases, we successfully apply the resulting continuous-valued controllers
  to the under-actuated pendulum swing up and analyze the performances of
  the suggested algorithms. It turns out that GPDP uses data very efficiently
  and can be applied to problems, where classic dynamic programming would
  be cumbersome.},
  volume = {72},
  number = {7-9},
  pages = {1508-1524},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = mar,
  year = {2009},
  slug = {5531},
  author = {Deisenroth, MP. and Rasmussen, CE. and Peters, J.},
  month_numeric = {3}
}

Forschung

Abteilungen

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

Forschung

Abteilungen

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

BibTex