Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency
Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.
Author(s): | Kienzle, W. and Wichmann, FA. and Schölkopf, B. and Franz, MO. |
Journal: | 10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007) |
Volume: | 10 |
Pages: | 1 |
Year: | 2007 |
Month: | July |
Day: | 0 |
Bibtex Type: | Poster (poster) |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
BibTex
@poster{4854, title = {Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency}, journal = {10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007)}, abstract = {Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.}, volume = {10}, pages = {1}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, month = jul, year = {2007}, slug = {4854}, author = {Kienzle, W. and Wichmann, FA. and Sch{\"o}lkopf, B. and Franz, MO.}, month_numeric = {7} }