Empirical Inference Conference Paper 2010

Bootstrapping Apprenticeship Learning

We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert's policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations.

Author(s): Boularias, A. and Chaib-Draa, B.
Book Title: Advances in Neural Information Processing Systems 23
Journal: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010
Pages: 289-297
Year: 2010
Day: 0
Editors: Lafferty, J. , C. K.I. Williams, J. Shawe-Taylor, R. S. Zemel, A. Culotta
Publisher: Curran
Bibtex Type: Conference Paper (inproceedings)
Address: Red Hook, NY, USA
Event Name: Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS 2010)
Event Place: Vancouver, BC, Canada
Digital: 0
Electronic Archiving: grant_archive
ISBN: 978-1-617-82380-0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{6826,
  title = {Bootstrapping Apprenticeship Learning},
  journal = {Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010},
  booktitle = {Advances in Neural Information Processing Systems 23},
  abstract = {We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement
  Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert's policy. In this paper, we show that the quality of the learned policies
  is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming
  that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our
  approach is able to learn good policies from a small number of demonstrations.},
  pages = {289-297},
  editors = {Lafferty, J. , C. K.I. Williams, J. Shawe-Taylor, R. S. Zemel, A. Culotta},
  publisher = {Curran},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Red Hook, NY, USA},
  year = {2010},
  slug = {6826},
  author = {Boularias, A. and Chaib-Draa, B.}
}