Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Institute Homepage

Institute Homepage Sign In

Back

Empirical Inference Article 2011

Empirical Inference

Jan Peters

Research Group Leader

Empirical Inference

Hirotaka Hachiya

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.

Author(s):	Hachiya, H. and Peters, J. and Sugiyama, M.
Journal:	Neural Computation
Volume:	23
Number (issue):	11
Pages:	2798-2832
Year:	2011
Month:	November
Day:	0

Bibtex Type:	Article (article)

DOI:	10.1162/NECO_a_00199

Digital:	0
Electronic Archiving:	grant_archive

Links:	Web

BibTex

@article{HachiyaPS2011,
  title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning},
  journal = {Neural Computation},
  abstract = {Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. },
  volume = {23},
  number = {11},
  pages = {2798-2832},
  month = nov,
  year = {2011},
  slug = {hachiyaps2011},
  author = {Hachiya, H. and Peters, J. and Sugiyama, M.},
  month_numeric = {11}
}