Perceiving Systems Talk Biography
02 July 2020 at 11:00 - 12:00

Real-time Multi-person 3D Motion Capture with a Single RGB Camera

Gvv team metha dushyant

In our recent work, XNect, we propose a real-time solution for the challenging task of multi-person 3D human pose estimation from a single RGB camera. To achieve real-time performance without compromising on accuracy, our approach relies on a new efficient Convolutional Neural Network architecture, and a multi-staged pose formulation. The CNN architecture is approx. 1.3x faster than ResNet-50, while achieving the same accuracy on various tasks, and the benefits extend beyond inference speed to a much smaller training memory footprint and a much higher training throughput. The proposed pose formulation jointly reasons about all the subjects in the scene, ensuring that pose inference can be done in real time even with a large number of subjects in the scene. The key insight behind the accuracy of the formulation is to split the reasoning about human pose into two distinct stages. The first stage, which is fully convolutional, infers 2D and 3D pose of body parts supported by image evidence, and reasons jointly about all subjects. The second stage, which is a small fully connected network, operates on each individual subject, and uses the context of the visibly body parts and learned pose priors, to infer the 3D pose of the missing body parts. A third stage on top reconciles the 2D and 3D poses per frame and across time, to produce a temporally stable kinematic skeleton. In this talk, we will briefly discuss the proposed Convolutional Neural Network architecture and the possible benefits it might bring to your workflow. The other part of the talk would be on how the pose formulation proposed in this work came to be, what its advantages are, and how it can be extended to other related problems.

Speaker Biography

Dushyant Mehta (Max Planck Institute for Informatics)

PhD candidate

Dushyant Mehta is a PhD student at the Graphics, Vision and Video group at the Max Planck Institute For Informatics. His research interests lie in various aspects of efficient Machine Learning, as well as 3D human pose estimation and tracking with a monocular RGB camera. Fore more information please refer to his homepage: https://people.mpi-inf.mpg.de/~dmetha/.