Empirical Inference Poster 2007

Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency

Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

Author(s): Kienzle, W. and Wichmann, FA. and Schölkopf, B. and Franz, MO.
Journal: 10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007)
Volume: 10
Pages: 1
Year: 2007
Month: July
Day: 0
Bibtex Type: Poster (poster)
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@poster{4854,
  title = {Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency},
  journal = {10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007)},
  abstract = {Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model.
  Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.},
  volume = {10},
  pages = {1},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = jul,
  year = {2007},
  slug = {4854},
  author = {Kienzle, W. and Wichmann, FA. and Sch{\"o}lkopf, B. and Franz, MO.},
  month_numeric = {7}
}