Back
Rewards and uncertainty jointly drive the attention dynamics in reinforcement learning
{Aim: The nature of attention, and how it interacts with learning and choice processes in the context of reinforcement learning, is still unclear. Probabilistic accounts of associative learning, as well as approximately optimal solutions of the exploration-exploitation dilemma, suggest that both learned value and uncertainty about those values (i.e. reducible or estimation uncertainty) are important for learning and choice. This implies that both factors should jointly guide attention. Our main goal was to test this prediction. Our secondary goal was to examine whether the relation between attention and reinforcement learning is bidirectional, whether attention also influences or biases what we learn and how we choose. There are some tests of this direction of influence; however, the role of estimation uncertainty has not previously been addressed. Method: Participants (N\textequals36) completed two games in which they repeatedly chose between six options. Each game was a multi-armed bandit task where rewards for each option were drawn from Gaussian distributions, differing in both their means and variances. The participants\textquotesingle goal was to maximize the cumulative sum of rewards in each game. To do this, they needed to explore the options in the choice set in order to learn which option had the highest average reward, and subsequently exploit this knowledge. We monitored participants\textquotesingle attention using eye tracking while they performed the tasks, operationalizing attention as the proportion of time spent fixating on each of the options before making a choice. Results: We relied on computational modeling to garner evidence for our two questions. To address our main question, we modeled attention with a combination of a Bayesian (Kalman filter) learning component and two types of choice rules: one that relies only on learned value (softmax) and one that additionally uses estimation uncertainty to assign an "exploration bonus" to the options (upper confidence bound rule). Model evidence showed that Kalman filter learning with the exploration bonus described overt attention best, providing evidence that trial-by-trial learned values and estimation uncertainty jointly guide visual attention. For our secondary question, we used the same models to model choices, but allowing measured attention to affect the choice process by increasing the probability of choosing attended options and decreasing it for unattended options. Attention was also allowed to modulate the magnitude of updates in the learning process. Again, we found that Kalman filter learning with exploration bonus was the best model, showing that estimation uncertainty plays an independent role in determining choice, over and above its effect on attention. Conclusions: In summary, the interaction between attention, learning, and decision making, extends further than previously found. Our results provide support for probabilistic associative learning accounts that ground attention in efficient computations rather than constraints, and establish a relation with approximately optimal resolutions of the exploration-exploitation trade-off.}
@misc{item_3169898, title = {{Rewards and uncertainty jointly drive the attention dynamics in reinforcement learning}}, booktitle = {{Ninth International Symposium on Biology of Decision Making (SBDM 2019)}}, abstract = {{Aim: The nature of attention, and how it interacts with learning and choice processes in the context of reinforcement learning, is still unclear. Probabilistic accounts of associative learning, as well as approximately optimal solutions of the exploration-exploitation dilemma, suggest that both learned value and uncertainty about those values (i.e. reducible or estimation uncertainty) are important for learning and choice. This implies that both factors should jointly guide attention. Our main goal was to test this prediction. Our secondary goal was to examine whether the relation between attention and reinforcement learning is bidirectional, whether attention also influences or biases what we learn and how we choose. There are some tests of this direction of influence; however, the role of estimation uncertainty has not previously been addressed. Method: Participants (N\textequals36) completed two games in which they repeatedly chose between six options. Each game was a multi-armed bandit task where rewards for each option were drawn from Gaussian distributions, differing in both their means and variances. The participants\textquotesingle goal was to maximize the cumulative sum of rewards in each game. To do this, they needed to explore the options in the choice set in order to learn which option had the highest average reward, and subsequently exploit this knowledge. We monitored participants\textquotesingle attention using eye tracking while they performed the tasks, operationalizing attention as the proportion of time spent fixating on each of the options before making a choice. Results: We relied on computational modeling to garner evidence for our two questions. To address our main question, we modeled attention with a combination of a Bayesian (Kalman filter) learning component and two types of choice rules: one that relies only on learned value (softmax) and one that additionally uses estimation uncertainty to assign an "exploration bonus" to the options (upper confidence bound rule). Model evidence showed that Kalman filter learning with the exploration bonus described overt attention best, providing evidence that trial-by-trial learned values and estimation uncertainty jointly guide visual attention. For our secondary question, we used the same models to model choices, but allowing measured attention to affect the choice process by increasing the probability of choosing attended options and decreasing it for unattended options. Attention was also allowed to modulate the magnitude of updates in the learning process. Again, we found that Kalman filter learning with exploration bonus was the best model, showing that estimation uncertainty plays an independent role in determining choice, over and above its effect on attention. Conclusions: In summary, the interaction between attention, learning, and decision making, extends further than previously found. Our results provide support for probabilistic associative learning accounts that ground attention in efficient computations rather than constraints, and establish a relation with approximately optimal resolutions of the exploration-exploitation trade-off.}}, pages = {135--136}, year = {2019}, slug = {item_3169898}, author = {Stojic, H and Orquin, JL and Dayan, P and Dolan, R and Speekenbrink, M} }