Optimizing Human Pose and Shape

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Learning Control

Learning Coupling Terms of Movement Primitives

Incremental Local Regression

Perception for Action

Autonomous Robotic Manipulation

Modeling Top-Down Saliency for Visual Object Search

Interactive Perception

State Estimation and Sensor Fusion for the Control of Legged Robots

Probabilistic Object and Manipulator Tracking

Global Object Shape Reconstruction by Fusing Visual and Tactile Data

Robot Arm Pose Estimation as a Learning Problem

Learning to Grasp from Big Data

Gaussian Filtering as Variational Inference

Template-Based Learning of Model Free Grasping

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Motion planning and control

Autonomous Robotic Manipulation

Learning Coupling Terms of Movement Primitives

State Estimation and Sensor Fusion for the Control of Legged Robots

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Movement Representation for Reactive Behavior

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Neural Control of Movement

Experimental Robotics

Autonomous Robotic Manipulation

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Other

Perceiving Systems Members Publications

Optimizing Human Pose and Shape

Sab 2016 2021 optimization v2 — Human inference via optimization. (Left) SMPLify estimates configurations of the SMPL body model from 2D body joints detected in images. (Middle) SMPLify-X estimates SMPL-X from whole-body 2D landmarks; note the expressive face and fingers. (Right) SMPLify-X humans (yellow) penetrate 3D objects; PROX (gray) extends it to use a 3D scene scan to encourage contact between bodies and objects, while discouraging inter-penetrations.

While data-driven methods for directly regressing 3D humans from 2D images are widely popular, optimization-based methods continue to play an important role. While typically slower than regression methods, optimization approaches require no training data, can be quickly adapted to new problems, and produce image-aligned results. In our view, the two approaches are not competing, but rather, complimentary.

Optimization-based approaches directly fit a 3D body model like SMPL to image observations (e.g., detected joint locations, edges, silhouettes, semantic segmentations, etc.). We introduced the first such method, SMPLify [], which optimizes SMPL pose and shape to minimize the 2D error between detected joints and projected SMPL joints. Because of the inherent ambiguity in estimating 3D from 2D, SMPLify introduced a pose prior trained on mocap data and a term that discouraged self-penetration.

With SMPLify-X [] we extend this concept to estimate the expressive SMPL-X model by fitting it to 2D landmarks from OpenPose. SMPLify-X introduced several improvements including a gender classifier so that the estimated body shapes better matched the image. We also introduced a better VAE-based pose prior, VPoser, trained on AMASS, and we improved the interpenetration detection.

Because images with ground-truth human pose and shape are hard to obtain, these optimization methods provide critical pseudo ground truth for training deep regression networks. For example, we use SMPLify-X to obtain SMPL-X fits to images and use these to train ExPose []. With SPIN [], we showed that an even tighter integration of regression and optimization is valuable and synergistic. SPIN uses a regressor to initialize SMPLify, which is then run for a few optimization steps, improving the fit. These improved fits are then used to retrain the regressor. By doing this in a loop, we incrementally obtain better training data and a better regressor. This training approach is now widely used.

The basic SMPLify(-X) approach is easily adapted to new problems making it a foundational tool in our research. For example, we extended it to perform multi-view fitting and use silhouettes [], which we exploited to create the AGORA [] and SPEC-MTP [] datasets. We use it with aerial vehicles to simultaneously solve for camera extrinsics and body pose in multi-view images []. We adapted it to RGB-D images by including a depth loss and scene contact constraints in the objective function, enabling the creation of the PROX dataset []. We added constraints related to self-contact and exploited this to create the training and test data for TUCH [].

Members

Affiliated Researcher

Perceiving Systems

Javier Romero

Affiliated Researcher

Robust Machine Learning

Research Group Leader

Intern

Research Scientist

Perceiving Systems

Ahmed Osman

Guest Scientist

Perceiving Systems

Dimitris Tzionas

Guest Scientist

Publications

Perceiving Systems Conference Paper SPEC: Seeing People in the Wild with an Estimated Camera Kocabas, M., Huang, C. P., Tesch, J., Müller, L., Hilliges, O., Black, M. J. In Proc. International Conference on Computer Vision (ICCV), :11015-11025, IEEE, Piscataway, NJ, International Conference on Computer Vision, October 2021 (Published) pdf supp arXiv code video project website poster DOI BibTeX

Perceiving Systems Conference Paper On Self-Contact and Human Pose Müller, L., Osman, A. A. A., Tang, S., Huang, C. P., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :9985-9994, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published) project arXiv poster video code DOI BibTeX

Perceiving Systems Conference Paper Monocular Expressive Body Regression through Body-Driven Attention Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M. J. In Computer Vision - ECCV 2020, 10:20-40, Lecture Notes in Computer Science, 12355, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) code Short video Long video arxiv pdf suppl DOI URL BibTeX

Perceiving Systems Conference Paper Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles Saini, N., Price, E., Tallamraju, R., Enficiaud, R., Ludwig, R., Martinović, I., Ahmad, A., Black, M. Proceedings 2019 IEEE/CVF International Conference on Computer Vision (ICCV), :823-832, IEEE, International Conference on Computer Vision (ICCV), October 2019 (Published) Code Data Video Paper Manuscript DOI BibTeX

Perceiving Systems Conference Paper Resolving 3D Human Pose Ambiguities with 3D Scene Constraints Hassan, M., Choutas, V., Tzionas, D., Black, M. J. In International Conference on Computer Vision (ICCV), :2282-2292, October 2019 (Published) pdf poster DOI URL BibTeX

Perceiving Systems Conference Paper Expressive Body Capture: 3D Hands, Face, and Body from a Single Image Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A. A. A., Tzionas, D., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :10975-10985, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () video code pdf suppl poster DOI URL BibTeX

Perceiving Systems Conference Paper Unite the People: Closing the Loop Between 3D and 2D Human Representations Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., Gehler, P. V. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, :4704-4713, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 () arXiv project/code/data BibTeX

Perceiving Systems Conference Paper Towards Accurate Marker-less Human Shape and Pose Estimation over Time Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P. V., Romero, J., Akhter, I., Black, M. J. In International Conference on 3D Vision (3DV), :421-430, 2017 () Code pdf DOI BibTeX

Perceiving Systems Conference Paper Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M. J. In Computer Vision – ECCV 2016, :561-578, Lecture Notes in Computer Science, Springer International Publishing, 14th European Conference on Computer Vision, October 2016 () pdf Video Sup Mat video Code Project ppt BibTeX