Autonomous Robotic Manipulation
Modeling Top-Down Saliency for Visual Object Search
Interactive Perception
State Estimation and Sensor Fusion for the Control of Legged Robots
Probabilistic Object and Manipulator Tracking
Global Object Shape Reconstruction by Fusing Visual and Tactile Data
Robot Arm Pose Estimation as a Learning Problem
Learning to Grasp from Big Data
Gaussian Filtering as Variational Inference
Template-Based Learning of Model Free Grasping
Associative Skill Memories
Real-Time Perception meets Reactive Motion Generation
Autonomous Robotic Manipulation
Learning Coupling Terms of Movement Primitives
State Estimation and Sensor Fusion for the Control of Legged Robots
Inverse Optimal Control
Motion Optimization
Optimal Control for Legged Robots
Movement Representation for Reactive Behavior
Associative Skill Memories
Real-Time Perception meets Reactive Motion Generation
Optimizing Human Pose and Shape

While data-driven methods for directly regressing 3D humans from 2D images are widely popular, optimization-based methods continue to play an important role. While typically slower than regression methods, optimization approaches require no training data, can be quickly adapted to new problems, and produce image-aligned results. In our view, the two approaches are not competing, but rather, complimentary.
Optimization-based approaches directly fit a 3D body model like SMPL to image observations (e.g., detected joint locations, edges, silhouettes, semantic segmentations, etc.). We introduced the first such method, SMPLify [], which optimizes SMPL pose and shape to minimize the 2D error between detected joints and projected SMPL joints. Because of the inherent ambiguity in estimating 3D from 2D, SMPLify introduced a pose prior trained on mocap data and a term that discouraged self-penetration.
With SMPLify-X [] we extend this concept to estimate the expressive SMPL-X model by fitting it to 2D landmarks from OpenPose. SMPLify-X introduced several improvements including a gender classifier so that the estimated body shapes better matched the image. We also introduced a better VAE-based pose prior, VPoser, trained on AMASS, and we improved the interpenetration detection.
Because images with ground-truth human pose and shape are hard to obtain, these optimization methods provide critical pseudo ground truth for training deep regression networks. For example, we use SMPLify-X to obtain SMPL-X fits to images and use these to train ExPose []. With SPIN [
], we showed that an even tighter integration of regression and optimization is valuable and synergistic. SPIN uses a regressor to initialize SMPLify, which is then run for a few optimization steps, improving the fit. These improved fits are then used to retrain the regressor. By doing this in a loop, we incrementally obtain better training data and a better regressor. This training approach is now widely used.
The basic SMPLify(-X) approach is easily adapted to new problems making it a foundational tool in our research. For example, we extended it to perform multi-view fitting and use silhouettes [], which we exploited to create the AGORA [
] and SPEC-MTP [
] datasets. We use it with aerial vehicles to simultaneously solve for camera extrinsics and body pose in multi-view images [
]. We adapted it to RGB-D images by including a depth loss and scene contact constraints in the objective function, enabling the creation of the PROX dataset [
]. We added constraints related to self-contact and exploited this to create the training and test data for TUCH [
].
Members
Publications