Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency

Institute Homepage

Institute Homepage Sign In

Back

Empirical Inference Poster 2007

Empirical Inference

Bernhard Schölkopf

Director

Research Scientist

Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

Author(s):	Kienzle, W. and Wichmann, FA. and Schölkopf, B. and Franz, MO.
Journal:	10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007)
Volume:	10
Pages:	1
Year:	2007
Month:	July
Day:	0

Bibtex Type:	Poster (poster)

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	Web

BibTex

@poster{4854,
  title = {Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency},
  journal = {10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007)},
  abstract = {Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model.
  Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.},
  volume = {10},
  pages = {1},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = jul,
  year = {2007},
  slug = {4854},
  author = {Kienzle, W. and Wichmann, FA. and Sch{\"o}lkopf, B. and Franz, MO.},
  month_numeric = {7}
}