Object Detection Using Deep Learning - Learning where to search using visual attention | Max Planck Institute for Intelligent Systems

Institute Homepage

Institute Homepage DE Sign In

Back

Autonomous Motion Master Thesis 2015

Object Detection Using Deep Learning - Learning where to search using visual attention

Autonomous Motion

Alina Kloss

Detecting and identifying the different objects in an image fast and reliably is an important skill for interacting with one’s environment. The main problem is that in theory, all parts of an image have to be searched for objects on many different scales to make sure that no object instance is missed. It however takes considerable time and effort to actually classify the content of a given image region and both time and computational capacities that an agent can spend on classification are limited. Humans use a process called visual attention to quickly decide which locations of an image need to be processed in detail and which can be ignored. This allows us to deal with the huge amount of visual information and to employ the capacities of our visual system efficiently. For computer vision, researchers have to deal with exactly the same problems, so learning from the behaviour of humans provides a promising way to improve existing algorithms. In the presented master’s thesis, a model is trained with eye tracking data recorded from 15 participants that were asked to search images for objects from three different categories. It uses a deep convolutional neural network to extract features from the input image that are then combined to form a saliency map. This map provides information about which image regions are interesting when searching for the given target object and can thus be used to reduce the parts of the image that have to be processed in detail. The method is based on a recent publication of Kümmerer et al., but in contrast to the original method that computes general, task independent saliency, the presented model is supposed to respond differently when searching for different target categories.

Author(s):	Alina Kloss
Year:	2015
Month:	May
Day:	26

Project(s):	Modeling Top-Down Saliency for Visual Object Search
Bibtex Type:	Master Thesis (mastersthesis)

Electronic Archiving:	grant_archive
School:	Eberhard Karls Universität Tübingen
Attachments:	PDF

BibTex

@mastersthesis{KlossThesis2015,
  title = {Object Detection Using Deep Learning - Learning where to search using visual attention},
  abstract = {Detecting and identifying the different objects in an image fast and reliably is an
  important skill for interacting with one’s environment. The main problem is that in
  theory, all parts of an image have to be searched for objects on many different scales
  to make sure that no object instance is missed. It however takes considerable time
  and effort to actually classify the content of a given image region and both time
  and computational capacities that an agent can spend on classification are limited.
  Humans use a process called visual attention to quickly decide which locations of
  an image need to be processed in detail and which can be ignored. This allows us
  to deal with the huge amount of visual information and to employ the capacities
  of our visual system efficiently.
  For computer vision, researchers have to deal with exactly the same problems,
  so learning from the behaviour of humans provides a promising way to improve
  existing algorithms. In the presented master’s thesis, a model is trained with eye
  tracking data recorded from 15 participants that were asked to search images for
  objects from three different categories. It uses a deep convolutional neural network
  to extract features from the input image that are then combined to form a saliency
  map. This map provides information about which image regions are interesting
  when searching for the given target object and can thus be used to reduce the
  parts of the image that have to be processed in detail. The method is based on a
  recent publication of Kümmerer et al., but in contrast to the original method that
  computes general, task independent saliency, the presented model is supposed to
  respond differently when searching for different target categories.
  },
  school = {Eberhard Karls Universität Tübingen},
  month = may,
  year = {2015},
  slug = {kloss-thesis-2015},
  author = {Kloss, Alina},
  month_numeric = {5}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives