Miscellaneous 2021

Compositional generalization in multi-armed bandits

{To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, termed the compositionally-structured multi-armed bandit task, we found evidence that participants are able to learn representations of different latent reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures, to make informed, compositional generalizations about rewards in complex contexts. We also provide a computational model which is able to generalize and compose knowledge of complex reward structures using a grammar over structures and show how such compositional inductive biases can be learned by meta-reinforcement learning agents.}

Author(s): Schulz, E
Book Title: Psychologie und Gehirn (PuG 2021)
Year: 2021
Bibtex Type: Miscellaneous (misc)
Electronic Archiving: grant_archive

BibTex

@misc{item_3321273,
  title = {{Compositional generalization in multi-armed bandits}},
  booktitle = {{Psychologie und Gehirn (PuG 2021)}},
  abstract = {{To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, termed the compositionally-structured multi-armed bandit task, we found evidence that participants are able to learn representations of different latent reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures, to make informed, compositional generalizations about rewards in complex contexts. We also provide a computational model which is able to generalize and compose knowledge of complex reward structures using a grammar over structures and show how such compositional inductive biases can be learned by meta-reinforcement learning agents.}},
  year = {2021},
  slug = {item_3321273},
  author = {Schulz, E}
}