Autonomous Motion Conference Paper 2008

Behavioral experiments on reinforcement learning in human motor control

Reinforcement learning (RL) - learning solely based on reward or cost feedback - is widespread in robotics control and has been also suggested as computational model for human motor control. In human motor control, however, hardly any experiment studied reinforcement learning. Here, we study learning based on visual cost feedback in a reaching task and did three experiments: (1) to establish a simple enough experiment for RL, (2) to study spatial localization of RL, and (3) to study the dependence of RL on the cost function. In experiment (1), subjects sit in front of a drawing tablet and look at a screen onto which the drawing pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. We choose as cost the squared distance between target and virtual pen position at the target line. Above a threshold value, the cost was fixed at this value. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. As result, subjects could learn the bias, and thus, showed reinforcement learning. In experiment (2), we randomly altered the target position between three different locations (three different directions from start point: -45, 0, 45). For each direction, we chose a different bias. As result, subjects learned all three bias values simultaneously. Thus, RL can be spatially localized. In experiment (3), we varied the sensitivity of the cost function by multiplying the squared distance with a constant value C, while keeping the same cut-off threshold. As in experiment (2), we had three target locations. We assigned to each location a different C value (this assignment was randomized between subjects). Since subjects learned the three locations simultaneously, we could directly compare the effect of the different cost functions. As result, we found an optimal C value; if C was too small (insensitive cost), learning was slow; if C was too large (narrow cost valley), the exploration time was longer and learning delayed. Thus, reinforcement learning in human motor control appears to be sen

Author(s): Hoffmann, H. and Theodorou, E. and Schaal, S.
Book Title: Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM)
Year: 2008
Bibtex Type: Conference Paper (inproceedings)
Address: Naples, Florida, April 29-May 4
Cross Ref: p3232
Electronic Archiving: grant_archive
Note: clmc

BibTex

@inproceedings{Hoffmann_AEAMNCM_2008,
  title = {Behavioral experiments on reinforcement learning in human motor control},
  booktitle = {Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM)},
  abstract = {Reinforcement learning (RL) - learning solely based on reward or cost 
  feedback - is widespread in robotics control and has been also suggested 
  as computational model for human motor control. In human motor control, 
  however, hardly any experiment studied reinforcement learning. Here, we 
  study learning based on visual cost feedback in a reaching task and did 
  three experiments: (1) to establish a simple enough experiment for RL, 
  (2) to study spatial localization of RL, and (3) to study the dependence 
  of RL on the cost function.   
  
  In experiment (1), subjects sit in front of a drawing tablet and look at 
  a screen onto which the drawing pen's position is projected. Beginning 
  from a start point, their task is to move with the pen through a target 
  point presented on screen. Visual feedback about the pen's position is 
  given only before movement onset. At the end of a movement, subjects get 
  visual feedback only about the cost of this trial. We choose as cost the 
  squared distance between target and virtual pen position at the target 
  line. Above a threshold value, the cost was fixed at this value. In the 
  mapping of the pen's position onto the screen, we added a bias (unknown 
  to subject) and Gaussian noise. As result, subjects could learn the 
  bias, and thus, showed reinforcement learning.
  
  In experiment (2), we randomly altered the target position between three 
  different locations (three different directions from start point: -45, 
  0, 45). For each direction, we chose a different bias. As result, 
  subjects learned all three bias values simultaneously. Thus, RL can be 
  spatially localized.
  
  In experiment (3), we varied the sensitivity of the cost function by 
  multiplying the squared distance with a constant value C, while keeping 
  the same cut-off threshold. As in experiment (2), we had three target 
  locations. We assigned to each location a different C value (this 
  assignment was randomized between subjects). Since subjects learned the 
  three locations simultaneously, we could directly compare the effect of 
  the different cost functions. As result, we found an optimal C value; if 
  C was too small (insensitive cost), learning was slow; if C was too 
  large (narrow cost valley), the exploration time was longer and learning 
  delayed. Thus, reinforcement learning in human motor control appears to 
  be sen},
  address = {Naples, Florida, April 29-May 4},
  year = {2008},
  note = {clmc},
  slug = {hoffmann_aeamncm_2008},
  author = {Hoffmann, H. and Theodorou, E. and Schaal, S.},
  crossref = {p3232}
}