Perceiving Systems Conference Paper 2024

PACE: Human and Camera Motion Estimation from in-the-wild Videos

Screenshot 2024 06 09 at 11.38.33 am

We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM as initialization, we propose to tightly integrate SLAM and human motion priors in an optimization that is inspired by bundle adjustment. Specifically, we optimize human and camera motions to match both the observed human pose and scene features. This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation. We additionally introduce a motion prior that is suitable for batch optimization, making our approach significantly more efficient than existing approaches. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experiments on the synthetic and real-world RICH datasets demonstrate that our approach substantially outperforms prior art in recovering both human and camera motions.

Author(s): Muhammed Kocabas and Ye Yuan and Pavlo Molchanov and Yunrong Guo and Michael J. Black and Otmar Hilliges and Jan Kautz and Umar Iqbal
Book Title: International Conference on 3D Vision (3DV 2024)
Year: 2024
Month: March
Bibtex Type: Conference Paper (inproceedings)
Event Name: 3DV 2024
Event Place: Davos, Switzerland
State: Published
Electronic Archiving: grant_archive
Links:
Attachments:

BibTex

@inproceedings{pace2024kocabas,
  title = {{PACE}: Human and Camera Motion Estimation from in-the-wild Videos},
  booktitle = {International Conference on 3D Vision (3DV 2024)},
  abstract = {We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM as initialization, we propose to tightly integrate SLAM and human motion priors in an optimization that is inspired by bundle adjustment. Specifically, we optimize human and camera motions to match both the observed human pose and scene features. This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation. We additionally introduce a motion prior that is suitable for batch optimization, making our approach significantly more efficient than existing approaches. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experiments on the synthetic and real-world RICH datasets demonstrate that our approach substantially outperforms prior art in recovering both human and camera motions. },
  month = mar,
  year = {2024},
  slug = {pace2024kocabas},
  author = {Kocabas, Muhammed and Yuan, Ye and Molchanov, Pavlo and Guo, Yunrong and Black, Michael J. and Hilliges, Otmar and Kautz, Jan and Iqbal, Umar},
  month_numeric = {3}
}