Embodied Vision Conference Paper 2023

Learning-based Relational Object Matching Across Views

Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium view point changes, for large view point changes, matching semantically on the object-level becomes advantageous. In this paper, we propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images. We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network. We demonstrate our approach in a large variety of views on realistically rendered synthetic images. Our approach compares favorably to previous state-of-the-art object-level matching approaches and achieves improved performance over a pure keypoint-based approach for large view-point changes.

Author(s): Elich, Cathrin and Armeni, Iro and Oswald, Martin R. and Pollefeys, Marc and Stueckler, Joerg
Book Title: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Year: 2023
Project(s):
Bibtex Type: Conference Paper (inproceedings)
DOI: 10.1109/ICRA48891.2023.10161393
State: Published
URL: https://doi.org/10.1109/ICRA48891.2023.10161393
Electronic Archiving: grant_archive
Links:

BibTex

@inproceedings{elich2023relobjmatch,
  title = {Learning-based Relational Object Matching Across Views},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  abstract = {Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium view point changes, for large view point changes, matching semantically on the object-level becomes advantageous. In this paper, we propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images. We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network. We demonstrate our approach in a large variety of views on realistically rendered synthetic images. Our approach compares favorably to previous state-of-the-art object-level matching approaches and achieves improved performance over a pure keypoint-based approach for large view-point changes.},
  year = {2023},
  slug = {elich2023relobjmatch},
  author = {Elich, Cathrin and Armeni, Iro and Oswald, Martin R. and Pollefeys, Marc and Stueckler, Joerg},
  url = {https://doi.org/10.1109/ICRA48891.2023.10161393}
}