Perceiving Systems Conference Paper 2021

SPEC: Seeing People in the Wild with an Estimated Camera

Webpage teaser

Due to the lack of camera parameter information for in-the-wild images, existing 3D human pose and shape (HPS) estimation methods make several simplifying assumptions: weak-perspective projection, large constant focal length, and zero camera rotation. These assumptions often do not hold and we show, quantitatively and qualitatively, that they cause errors in the reconstructed 3D shape and pose. To address this, we introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately. First, we train a neural network to estimate the field of view, camera pitch, and roll given an input image. We employ novel losses that improve the calibration accuracy over previous work. We then train a novel network that concatenates the camera calibration to the image features and uses these together to regress 3D body shape and pose. SPEC is more accurate than the prior art on the standard benchmark (3DPW) as well as two new datasets with more challenging camera views and varying focal lengths. Specifically, we create a new photorealistic synthetic dataset (SPEC-SYN) with ground truth 3D bodies and a novel in-the-wild dataset (SPEC-MTP) with calibration and high-quality reference bodies.

Author(s): Muhammed Kocabas and Chun-Hao P. Huang and Joachim Tesch and Lea Müller and Otmar Hilliges and Michael J. Black
Book Title: Proc. International Conference on Computer Vision (ICCV)
Pages: 11015--11025
Year: 2021
Month: October
Publisher: IEEE
Project(s):
Bibtex Type: Conference Paper (inproceedings)
Address: Piscataway, NJ
DOI: 10.1109/ICCV48922.2021.01085
Event Name: International Conference on Computer Vision 2021
Event Place: virtual (originally Montreal, Canada)
State: Published
Electronic Archiving: grant_archive
ISBN: 978-1-6654-2812-5
Links:

BibTex

@inproceedings{Kocabas_SPEC_2021,
  title = {{SPEC}: Seeing People in the Wild with an Estimated Camera},
  booktitle = {Proc. International Conference on Computer Vision (ICCV)},
  abstract = {Due to the lack of camera parameter information for in-the-wild images, existing 3D human pose and shape (HPS) estimation methods make several simplifying assumptions: weak-perspective projection, large constant focal length, and zero camera rotation. These assumptions often do not hold and we show, quantitatively and qualitatively, that they cause errors in the reconstructed 3D shape and pose. To address this, we introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately. First, we train a neural network to estimate the field of view, camera pitch, and roll given an input image. We employ novel losses that improve the calibration accuracy over previous work. We then train a novel network that concatenates the camera calibration to the image features and uses these together to regress 3D body shape and pose. SPEC is more accurate than the prior art on the standard benchmark (3DPW) as well as two new datasets with more challenging camera views and varying focal lengths. Specifically, we create a new photorealistic synthetic dataset (SPEC-SYN) with ground truth 3D bodies and a novel in-the-wild dataset (SPEC-MTP) with calibration and high-quality reference bodies.},
  pages = {11015--11025},
  publisher = {IEEE},
  address = {Piscataway, NJ},
  month = oct,
  year = {2021},
  slug = {kocabas_spec_2021},
  author = {Kocabas, Muhammed and Huang, Chun-Hao P. and Tesch, Joachim and M\"uller, Lea and Hilliges, Otmar and Black, Michael J.},
  month_numeric = {10}
}