Autonomous Motion Empirische Inferenz Conference Paper 2007

Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

Author(s): Riedmiller, M. and Peters, J. and Schaal, S.
Book Title: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Pages: 254-261
Year: 2007
Bibtex Type: Conference Paper (inproceedings)
Event Name: ADPRL 2007
Event Place: Honolulu, Hawaii
Cross Ref: p2654
Electronic Archiving: grant_archive
Note: clmc
Links:

BibTex

@inproceedings{Riedmiller_PIISADPRL_2007,
  title = {Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark},
  booktitle = {Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning},
  abstract = {In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.},
  pages = {254-261},
  year = {2007},
  note = {clmc},
  slug = {riedmiller_piisadprl_2007},
  author = {Riedmiller, M. and Peters, J. and Schaal, S.},
  crossref = {p2654}
}