Perceiving Systems Talk Biography
10 September 2013 at 11:15 | Max Planck Haus Lecture Hall

Depth, You, and the World

Jisco jamie shotton

Consumer level depth cameras such as Kinect have changed the landscape of 3D computer vision.  In this talk we will discuss two approaches that both learn to directly infer correspondences between observed depth image pixels and 3D model points.  These correspondences can then be used to drive an optimization of a generative model to explain the data.  The first approach, the "Vitruvian Manifold", aims to fit an articulated 3D human model to a depth camera image, and extends our original Body Part Recognition algorithm used in Kinect.  It applies a per-pixel regression forest to infer direct correspondences between image pixels and points on a human mesh model.  This allows an efficient “one-shot” continuous optimization of the model parameters to recover the human pose.  The second approach, "Scene Coordinate Regression", addresses the problem of camera pose relocalization.  It uses a similar regression forest, but now aims to predict correspondences between observed image pixels and 3D world coordinates in an arbitrary 3D scene.  These correspondences are again used to drive an efficient optimization of the camera pose to a highly accurate result from a single input frame.

Speaker Biography

Jamie Shotton (Microsoft Research)