Fitted Q-iteration by Advantage Weighted Regression

Institute Homepage

Institute Homepage DE Sign In

Back

Empirical Inference Conference Paper 2009

Empirical Inference

Jan Peters

Research Group Leader

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

Author(s):	Neumann, G. and Peters, J.
Book Title:	Advances in neural information processing systems 21
Journal:	Advances in neural information processing systems 21 : 22nd Annual Conference on Neural Information Processing Systems 2008
Pages:	1177-1184
Year:	2009
Month:	June
Day:	0
Editors:	Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou
Publisher:	Curran

Bibtex Type:	Conference Paper (inproceedings)

Address:	Red Hook, NY, USA
Event Name:	Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008)
Event Place:	Vancouver, BC, Canada

Digital:	0
Electronic Archiving:	grant_archive
ISBN:	978-1-605-60949-2
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF Web

BibTex

@inproceedings{5520,
  title = {Fitted Q-iteration by Advantage Weighted Regression},
  journal = {Advances in neural information processing systems 21 : 22nd Annual Conference on Neural Information Processing Systems 2008},
  booktitle = {Advances in neural information processing systems 21},
  abstract = {Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.},
  pages = {1177-1184},
  editors = {Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou},
  publisher = {Curran},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Red Hook, NY, USA},
  month = jun,
  year = {2009},
  slug = {5520},
  author = {Neumann, G. and Peters, J.},
  month_numeric = {6}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives