Empirical Inference
Article
2011
Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.
Author(s): | Hachiya, H. and Peters, J. and Sugiyama, M. |
Journal: | Neural Computation |
Volume: | 23 |
Number (issue): | 11 |
Pages: | 2798-2832 |
Year: | 2011 |
Month: | November |
Day: | 0 |
Bibtex Type: | Article (article) |
DOI: | 10.1162/NECO_a_00199 |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Links: |
BibTex
@article{HachiyaPS2011, title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning}, journal = {Neural Computation}, abstract = {Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. }, volume = {23}, number = {11}, pages = {2798-2832}, month = nov, year = {2011}, slug = {hachiyaps2011}, author = {Hachiya, H. and Peters, J. and Sugiyama, M.}, month_numeric = {11} }