Using reward-weighted regression for reinforcement learning of task space control

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Motion Empirical Inference Conference Paper 2007

Empirical Inference

Jan Peters

Research Group Leader

Autonomous Motion

Stefan Schaal

Director

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

Author(s):	Peters, J. and Schaal, S.
Book Title:	Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Pages:	262-267
Year:	2007

Bibtex Type:	Conference Paper (inproceedings)

Address:	Honolulu, Hawaii, April 1-5, 2007
DOI:	10.1109/ADPRL.2007.368197
URL:	http://www-clmc.usc.edu/publications/P/peters-ADPRL2007.pdf

Cross Ref:	p2672
Electronic Archiving:	grant_archive
Note:	clmc

BibTex

@inproceedings{Peters_PIISADPRL_2007,
  title = {Using reward-weighted regression for reinforcement learning of task space control},
  booktitle = {Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning},
  abstract = {In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.},
  pages = {262-267},
  address = {Honolulu, Hawaii, April 1-5, 2007},
  year = {2007},
  note = {clmc},
  slug = {peters_piisadprl_2007},
  author = {Peters, J. and Schaal, S.},
  crossref = {p2672},
  url = {http://www-clmc.usc.edu/publications/P/peters-ADPRL2007.pdf}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives