Autonomous Motion Conference Paper 2010

Reinforcement learning of full-body humanoid motor skills

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI<sup>2</sup>), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI<sup>2</sup> is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI<sup>2</sup> in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

Author(s): Stulp, F. and Buchli, J. and Theodorou, E. and Schaal, S.
Book Title: Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on
Pages: 405-410
Year: 2010
Month: December
Day: 6-8
Bibtex Type: Conference Paper (inproceedings)
URL: http://www-clmc.usc.edu/publications/S/stulp-Humanoids2010.pdf
Cross Ref: p10414
Electronic Archiving: grant_archive
Note: clmc

BibTex

@inproceedings{Stulp_HRIIC_2010,
  title = {Reinforcement learning of full-body humanoid motor skills},
  booktitle = {Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on},
  abstract = {Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI<sup>2</sup>), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI<sup>2</sup> is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI<sup>2</sup> in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.},
  pages = {405-410},
  month = dec,
  year = {2010},
  note = {clmc},
  slug = {stulp_hriic_2010},
  author = {Stulp, F. and Buchli, J. and Theodorou, E. and Schaal, S.},
  crossref = {p10414},
  url = {http://www-clmc.usc.edu/publications/S/stulp-Humanoids2010.pdf},
  month_numeric = {12}
}