Autonomous Motion Empirische Inferenz Conference Paper 2007

Reinforcement learning for operational space control

While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting supervised learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-convexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. The important insight that many operational space control algorithms can be reformulated as optimal control problems, however, allows addressing this inverse learning problem in the framework of reinforcement learning. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-based reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

Author(s): Peters, J. and Schaal, S.
Book Title: Proceedings of the 2007 IEEE International Conference on Robotics and Automation
Pages: 2111-2116
Year: 2007
Publisher: IEEE Computer Society
Bibtex Type: Conference Paper (inproceedings)
DOI: 10.1109/ROBOT.2007.363633
Event Name: ICRA 2007
Event Place: Roma, Italy
URL: http://www-clmc.usc.edu/publications/P/peters-ICRA2007.pdf
Cross Ref: p2670
Electronic Archiving: grant_archive
Note: clmc

BibTex

@inproceedings{Peters_ICRA_2007,
  title = {Reinforcement learning for operational space control},
  booktitle = {Proceedings of the 2007 IEEE International Conference on Robotics and Automation},
  abstract = {While operational space control is of essential importance
  for robotics and well-understood from an analytical
  point of view, it can be prohibitively hard to achieve accurate
  control in face of modeling errors, which are inevitable in
  complex robots, e.g., humanoid robots. In such cases, learning
  control methods can offer an interesting alternative to analytical
  control algorithms. However, the resulting supervised learning
  problem is ill-defined as it requires to learn an inverse mapping
  of a usually redundant system, which is well known to suffer
  from the property of non-convexity of the solution space, i.e.,
  the learning system could generate motor commands that try
  to steer the robot into physically impossible configurations. The
  important insight that many operational space control algorithms
  can be reformulated as optimal control problems, however, allows
  addressing this inverse learning problem in the framework of
  reinforcement learning. However, few of the known optimization
  or reinforcement learning algorithms can be used in online
  learning control for robots, as they are either prohibitively
  slow, do not scale to interesting domains of complex robots,
  or require trying out policies generated by random search,
  which are infeasible for a physical system. Using a generalization
  of the EM-based reinforcement learning framework suggested
  by Dayan & Hinton, we reduce the problem of learning with
  immediate rewards to a reward-weighted regression problem
  with an adaptive, integrated reward transformation for faster
  convergence. The resulting algorithm is efficient, learns smoothly
  without dangerous jumps in solution space, and works well in
  applications of complex high degree-of-freedom robots.},
  pages = {2111-2116},
  publisher = {IEEE Computer Society},
  year = {2007},
  note = {clmc},
  slug = {peters_icra_2007},
  author = {Peters, J. and Schaal, S.},
  crossref = {p2670},
  url = {http://www-clmc.usc.edu/publications/P/peters-ICRA2007.pdf}
}