Capturing Contact

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Inferring and exploiting contact

Generative Proxemics: A Prior for 3D Social Interaction from Images

BITE -- Dog Shape and Pose from an Image

HOLD -- inferring 3D hand and object shape from video

MOVER -- Reconstructing 3D Scenes and People using Interaction

Datasets for understanding humans and animals

The Poses for Equine Research Dataset (PFERD)

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

The BioAMASS Dataset

OpenCapBench dataset

Human health and the 3D body

Body Shape Models in Treating Anorexia Nervosa

Customized Bone Plants for Humerus Shaft Fractures

Reconstructing Signing Avatars From Video Using Linguistic Priors

The AI animator

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Gaussian Garments

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

Language, Vision, and World Models

AWOL: Analysis WithOut synthesis using Language

Re-Thinking Inverse Graphics with Large Language Models

TeCH: Text-guided Reconstruction of Clothed Humans

Human pose, shape, and motion capture

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

3D Human Pose Estimation via Intuitive Physics

Accurate 3D Body Shape Regression using Metric and Semantic Attributes

BEV

Generating human motion

Generating Human Interaction Motions in Scenes with Text Control

TEMOS: Generating Diverse Human Motions from Text

EMAGE: Full-body Gestures from Audio

TEACH: Temporal Action Compositions for 3D Humans

Robot Perception Group

AirCap: 3D Motion Capture

AirCap: Perception-Based Control

AirCapRL: Aerial Motion Capture Using Deep RL

Data Team

Lab Tours and Public Outreach

Collecting Data - From the Idea to the Publication

Capture Technologies Setup

Completed Projects

Human Pose, Shape and Action

3D Pose from Images

2D Pose from Images

Beyond Motion Capture

Action and Behavior

Body Perception

Body Applications

Pose and Motion Priors

Clothing Models (2011-2015)

Reflectance Filtering

Learning on Manifolds

Markerless Animal Motion Capture

Multi-Camera Capture

2D Pose from Optical Flow

Body Perception

Neural Prosthetics and Decoding

Part-based Body Models

Intrinsic Depth

Lie Bodies

Layers, Time and Segmentation

Understanding Action Recognition (JHMDB)

Intrinsic Video

Intrinsic Images

Action Recognition with Tracking

Neural Control of Grasping

Flowing Puppets

Faces

Deformable Structures

Model-based Anthropometry

Modeling 3D Human Breathing

Optical flow in the LGN

FlowCap

Smooth Loops from Unconstrained Video

PCA Flow

Efficient and Scalable Inference

Motion Blur in Layers

Facade Segmentation

Smooth Metric Learning

Robust PCA

3D Recognition

Object Detection

Perceiving Systems Members Publications

Capturing Contact

Understanding and modeling human behavior requires capturing humans moving in, and interacting with, the world. Standard 3D human body pose and shape (HPS) methods estimate bodies in isolation from the objects and people around them. The results are often physically implausible and lack key information. We view contact as central to understanding behavior and therefore essential in human motion capture. Our goal is to capture people in the context of the world where contact is as important as pose.

To study this, we captured the PROX dataset [] using 3D scene scans and an RGB-D sensor to obtain pseudo ground truth poses with physically meaningful contact. Knowing the 3D scene enables more accurate HPS estimation from monocular RGB images by exploiting contact and interpenetration contraints.

Using the body-scene contact data from PROX, POSA [] learns a generative model of contact for the vertices of a posed body. We use this body-centric prior in monocular pose estimation to encourage the estimated body to have physically and semantically meaningful scene contacts.

TUCH [] explores HPS estimation with self-contact. We create novel datasets of images with known 3D contact poses or contact labels. Using these, and a contact-aware version of SMPLify-X [], we train a regression network using a modified version of SPIN []. Not only is TUCH more accurate for images with self-contact, but also for non-contact poses.

To learn to regress 3D hands and objects from an image, we created the ObMan dataset by extending a robotic grasp simulator to MANO [] and rendering images of hands grasping many objects. We trained a network to regress both object and hand shape, while encouraging contact and avoiding interpenetration.

To address detailed whole-body contact during object manipulation we used mocap to create the GRAB dataset []. GRAB goes beyond previous hand-centric datasets to capture actions like drinking where contact occurs between a cup and fingers as well as the lips.

Members

Guest Scientist

Perceiving Systems

Chun-Hao Paul Huang

Guest Scientist

Perceiving Systems

Ahmed Osman

Guest Scientist

Postdoctoral Researcher

Guest Scientist

Postdoctoral Researcher

Perceiving Systems

Joachim Tesch

Software Engineer, Real-time Graphics (VR/MR)

Perceiving Systems

Gul Varol

Guest Scientist

Perceiving Systems

Cordelia Schmid

Affiliated Researcher

Perceiving Systems

Hongwei Yi

Guest Scientist

Publications

Perceiving Systems Optics and Sensing Laboratory Conference Paper Capturing and Inferring Dense Full-Body Human-Scene Contact Huang, C. P., Yi, H., Höschle, M., Safroshkin, M., Alexiadis, T., Polikovsky, S., Scharstein, D., Black, M. J. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :13264-13275, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) project arXiv BSTRO code video DOI URL BibTeX

Perceiving Systems Neural Capture and Synthesis Conference Paper Human-Aware Object Placement for Visual Environment Reconstruction Yi, H., Huang, C. P., Tzionas, D., Kocabas, M., Hassan, M., Tang, S., Thies, J., Black, M. J. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :3949-3960, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) project arXiv DOI URL BibTeX

Perceiving Systems Conference Paper GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Taheri, O., Choutas, V., Black, M. J., Tzionas, D. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :13253-13263, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) arXiv Sup. Mat Project YouTube Code Models DOI URL BibTeX

Perceiving Systems Conference Paper On Self-Contact and Human Pose Müller, L., Osman, A. A. A., Tang, S., Huang, C. P., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :9985-9994, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published) project arXiv poster video code DOI BibTeX

Perceiving Systems Conference Paper Populating 3D Scenes by Learning Human-Scene Interaction Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :14703-14713, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published) project pdf poster video DOI BibTeX

Perceiving Systems Conference Paper GRAB: A Dataset of Whole-Body Human Grasping of Objects Taheri, O., Ghorbani, N., Black, M. J., Tzionas, D. In Computer Vision – ECCV 2020, 4:581-600, Lecture Notes in Computer Science, 12349, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) pdf suppl video (long) video (short) DOI URL BibTeX

Perceiving Systems Conference Paper Resolving 3D Human Pose Ambiguities with 3D Scene Constraints Hassan, M., Choutas, V., Tzionas, D., Black, M. J. In International Conference on Computer Vision (ICCV), :2282-2292, October 2019 (Published) pdf poster DOI URL BibTeX

Perceiving Systems Conference Paper Learning Joint Reconstruction of Hands and Manipulated Objects Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () pdf suppl poster DOI URL BibTeX