Variable impedance control - a reinforcement learning approach | Max Planck Institute for Intelligent Systems

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Motion Conference Paper 2010

Variable impedance control - a reinforcement learning approach

Autonomous Motion

Stefan Schaal

Director

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

Author(s):	Buchli, J. and Theodorou, E. and Stulp, F. and Schaal, S.
Book Title:	Robotics Science and Systems (2010)
Year:	2010

Bibtex Type:	Conference Paper (inproceedings)

Address:	Zaragoza, Spain, June 27-30
URL:	http://www-clmc.usc.edu/publications/B/buchli-RSS2010.pdf

Cross Ref:	p10423
Electronic Archiving:	grant_archive
Note:	clmc

BibTex

@inproceedings{Buchli_RSS_2010,
  title = {Variable impedance control - a reinforcement learning approach},
  booktitle = {Robotics Science and Systems (2010)},
  abstract = {One of the hallmarks of the performance, versatility,
  and robustness of biological motor control is the ability to adapt
  the impedance of the overall biomechanical system to different
  task requirements and stochastic disturbances. A transfer of this
  principle to robotics is desirable, for instance to enable robots
  to work robustly and safely in everyday human environments. It
  is, however, not trivial to derive variable impedance controllers
  for practical high DOF robotic tasks. In this contribution, we accomplish
  such gain scheduling with a reinforcement learning approach
  algorithm, PI2 (Policy Improvement with Path Integrals).
  PI2 is a model-free, sampling based learning method derived from
  first principles of optimal control. The PI2 algorithm requires no
  tuning of algorithmic parameters besides the exploration noise.
  The designer can thus fully focus on cost function design to
  specify the task. From the viewpoint of robotics, a particular
  useful property of PI2 is that it can scale to problems of many
  DOFs, so that RL on real robotic systems becomes feasible. We
  sketch the PI2 algorithm and its theoretical properties, and how
  it is applied to gain scheduling. We evaluate our approach by
  presenting results on two different simulated robotic systems, a
  3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight
  Robot. We investigate tasks where the optimal strategy requires
  both tuning of the impedance of the end-effector, and tuning
  of a reference trajectory. The results show that we can use
  path integral based RL not only for planning but also to derive
  variable gain feedback controllers in realistic scenarios. Thus,
  the power of variable impedance control is made available to a
  wide variety of robotic systems and practical applications.},
  address = {Zaragoza, Spain, June 27-30},
  year = {2010},
  note = {clmc},
  slug = {buchli_rss_2010},
  author = {Buchli, J. and Theodorou, E. and Stulp, F. and Schaal, S.},
  crossref = {p10423},
  url = {http://www-clmc.usc.edu/publications/B/buchli-RSS2010.pdf}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives