Empirical Inference Conference Paper 2008

Approximate Dynamic Programming with Gaussian Processes

In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem.

Author(s): Deisenroth, MP. and Peters, J. and Rasmussen, CE.
Book Title: ACC 2008
Journal: Proceedings of the 2008 American Control Conference (ACC 2008)
Pages: 4480-4485
Year: 2008
Month: June
Day: 0
Publisher: IEEE Service Center
Bibtex Type: Conference Paper (inproceedings)
Address: Piscataway, NJ, USA
Event Name: 2008 American Control Conference
Event Place: Seattle, WA, USA
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4975,
  title = {Approximate Dynamic Programming with Gaussian Processes},
  journal = {Proceedings of the 2008 American Control Conference (ACC 2008)},
  booktitle = {ACC 2008},
  abstract = {In general, it is difficult to determine an optimal
  closed-loop policy in nonlinear control problems with
  continuous-valued state and control domains. Hence, approximations
  are often inevitable. The standard method of discretizing
  states and controls suffers from the curse of dimensionality
  and strongly depends on the chosen temporal sampling rate. In
  this paper, we introduce Gaussian process dynamic programming
  (GPDP) and determine an approximate globally optimal
  closed-loop policy. In GPDP, value functions in the Bellman
  recursion of the dynamic programming algorithm are modeled
  using Gaussian processes. GPDP returns an optimal statefeedback
  for a finite set of states. Based on these outcomes, we
  learn a possibly discontinuous closed-loop policy on the entire
  state space by switching between two independently trained
  Gaussian processes. A binary classifier selects one Gaussian
  process to predict the optimal control signal. We show that
  GPDP is able to yield an almost optimal solution to an LQ
  problem using few sample points. Moreover, we successfully
  apply GPDP to the underpowered pendulum swing up, a
  complex nonlinear control problem.},
  pages = {4480-4485},
  publisher = {IEEE Service Center},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Piscataway, NJ, USA},
  month = jun,
  year = {2008},
  slug = {4975},
  author = {Deisenroth, MP. and Peters, J. and Rasmussen, CE.},
  month_numeric = {6}
}