Reinforcement learning of full-body humanoid motor skills

Institute Homepage

Institute Homepage Sign In

Back

Autonomous Motion Conference Paper 2010

Autonomous Motion

Stefan Schaal

Director

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

Author(s):	Stulp, F. and Buchli, J. and Theodorou, E. and Schaal, S.
Book Title:	Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on
Pages:	405-410
Year:	2010
Month:	December
Day:	6-8

Bibtex Type:	Conference Paper (inproceedings)

URL:	http://www-clmc.usc.edu/publications/S/stulp-Humanoids2010.pdf

Cross Ref:	p10414
Electronic Archiving:	grant_archive
Note:	clmc

BibTex

@inproceedings{Stulp_HRIIC_2010,
  title = {Reinforcement learning of full-body humanoid motor skills},
  booktitle = {Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on},
  abstract = {Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI<sup>2</sup>), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI<sup>2</sup> is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI<sup>2</sup> in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.},
  pages = {405-410},
  month = dec,
  year = {2010},
  note = {clmc},
  slug = {stulp_hriic_2010},
  author = {Stulp, F. and Buchli, J. and Theodorou, E. and Schaal, S.},
  crossref = {p10414},
  url = {http://www-clmc.usc.edu/publications/S/stulp-Humanoids2010.pdf},
  month_numeric = {12}
}