Learning variable impedance control | Max Planck Institute for Intelligent Systems

Institute Homepage

Institute Homepage DE Sign In

Autonomous Motion Article 2011

Learning variable impedance control

Thumb ticker sm ss

Autonomous Motion

Director

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PISq ({f P}olicy {f I}mprovement with {f P}ath {f I}ntegrals). PISq is a model-free, sampling based learning method derived from first principles of stochastic optimal control. The PISq algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PISq is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PISq algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

Author(s):	Buchli, J. and Stulp, F. and Theodorou, E. and Schaal, S.
Journal:	International Journal of Robotics Research
Year:	2011

Bibtex Type:	Article (article)

URL:	http://www-clmc.usc.edu/publications/B/buchli-IJRR2011.pdf

Cross Ref:	p10406
Electronic Archiving:	grant_archive
Note:	clmc

BibTex

@article{Buchli_IJRR_2011,
  title = {Learning variable impedance control},
  journal = {International Journal of Robotics Research},
  abstract = {One of the hallmarks of the performance, versatility, and robustness
  of biological motor control is the ability to adapt the impedance of
  the overall biomechanical system to different task requirements and
  stochastic disturbances. A transfer of this principle to robotics is
  desirable, for instance to enable robots to work robustly and safely
  in everyday human environments. It is, however, not trivial to derive
  variable impedance controllers for practical high degree-of-freedom
  (DOF) robotic tasks.
  
  In this contribution, we accomplish such variable impedance control
  with the reinforcement learning (RL) algorithm PISq ({f P}olicy
  {f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
  model-free, sampling based learning method derived from first
  principles of stochastic optimal control. The PISq algorithm requires no tuning
  of algorithmic parameters besides the exploration noise. The designer
  can thus fully focus on cost function design to specify the task. From
  the viewpoint of robotics, a particular useful property of PISq is
  that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
  systems becomes feasible.
  We sketch the PISq algorithm and its theoretical properties, and how
  it is applied to gain scheduling for variable impedance control.
  
  We evaluate our approach by presenting results on several simulated and real robots.
  We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
  In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
  
  The results show that we can use path integral based reinforcement learning not only for
  planning but also to derive variable gain feedback controllers in
  realistic scenarios. Thus, the power of variable impedance control
  is made available to a wide variety of robotic systems and  practical 
  applications.
  },
  year = {2011},
  note = {clmc},
  slug = {buchli_ijrr_2011},
  author = {Buchli, J. and Stulp, F. and Theodorou, E. and Schaal, S.},
  crossref = {p10406},
  url = {http://www-clmc.usc.edu/publications/B/buchli-IJRR2011.pdf}
}