Reinforcement learning by reward-weighted regression for operational space control

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Motion Empirical Inference Conference Paper 2007

Empirical Inference

Jan Peters

Research Group Leader

Autonomous Motion

Stefan Schaal

Director

Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

Author(s):	Peters, J. and Schaal, S.
Book Title:	Proceedings of the 24th Annual International Conference on Machine Learning
Pages:	745-750
Year:	2007

Bibtex Type:	Conference Paper (inproceedings)

DOI:	10.1145/1273496.1273590
Event Name:	ICML 2007
Event Place:	Corvallis, OR, USA
URL:	http://www-clmc.usc.edu/publications//P/peters_ICML2007.pdf

Cross Ref:	p2675
Electronic Archiving:	grant_archive
Note:	clmc

BibTex

@inproceedings{Peters_PICML_2007,
  title = {Reinforcement learning by reward-weighted regression for operational space control},
  booktitle = {Proceedings of the 24th Annual International Conference on Machine Learning},
  abstract = {Many robot control problems of practical importance, including
  operational space control, can be reformulated as immediate reward
  reinforcement learning problems. However, few of the known
  optimization or reinforcement learning algorithms can be used in
  online learning control for robots, as they are either prohibitively
  slow, do not scale to interesting domains of complex robots, or
  require trying out policies generated by random search, which are
  infeasible for a physical system. Using a generalization of the
  EM-base reinforcement learning framework suggested by Dayan &
  Hinton, we reduce the problem of learning with immediate rewards to a
  reward-weighted regression problem with an adaptive, integrated reward
  transformation for faster convergence. The resulting algorithm is 
  efficient, learns smoothly without dangerous jumps in solution space,
  and works well in applications of complex high degree-of-freedom robots.},
  pages = {745-750},
  year = {2007},
  note = {clmc},
  slug = {peters_picml_2007},
  author = {Peters, J. and Schaal, S.},
  crossref = {p2675},
  url = {http://www-clmc.usc.edu/publications//P/peters_ICML2007.pdf}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives