Inferring and exploiting contact

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Generative Proxemics: A Prior for 3D Social Interaction from Images

BITE -- Dog Shape and Pose from an Image

HOLD -- inferring 3D hand and object shape from video

MOVER -- Reconstructing 3D Scenes and People using Interaction

Datasets for understanding humans and animals

The Poses for Equine Research Dataset (PFERD)

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

The BioAMASS Dataset

OpenCapBench dataset

Human health and the 3D body

Body Shape Models in Treating Anorexia Nervosa

Customized Bone Plants for Humerus Shaft Fractures

Reconstructing Signing Avatars From Video Using Linguistic Priors

The AI animator

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Gaussian Garments

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

Language, Vision, and World Models

AWOL: Analysis WithOut synthesis using Language

Re-Thinking Inverse Graphics with Large Language Models

TeCH: Text-guided Reconstruction of Clothed Humans

Human pose, shape, and motion capture

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

3D Human Pose Estimation via Intuitive Physics

Accurate 3D Body Shape Regression using Metric and Semantic Attributes

BEV

Generating human motion

Generating Human Interaction Motions in Scenes with Text Control

TEMOS: Generating Diverse Human Motions from Text

EMAGE: Full-body Gestures from Audio

TEACH: Temporal Action Compositions for 3D Humans

Robot Perception Group

AirCap: 3D Motion Capture

AirCap: Perception-Based Control

AirCapRL: Aerial Motion Capture Using Deep RL

Data Team

Lab Tours and Public Outreach

Collecting Data - From the Idea to the Publication

Capture Technologies Setup

Completed Projects

Human Pose, Shape and Action

3D Pose from Images

2D Pose from Images

Beyond Motion Capture

Action and Behavior

Body Perception

Body Applications

Pose and Motion Priors

Clothing Models (2011-2015)

Reflectance Filtering

Learning on Manifolds

Markerless Animal Motion Capture

Multi-Camera Capture

2D Pose from Optical Flow

Body Perception

Neural Prosthetics and Decoding

Part-based Body Models

Intrinsic Depth

Lie Bodies

Layers, Time and Segmentation

Understanding Action Recognition (JHMDB)

Intrinsic Video

Intrinsic Images

Action Recognition with Tracking

Neural Control of Grasping

Flowing Puppets

Faces

Deformable Structures

Model-based Anthropometry

Modeling 3D Human Breathing

Optical flow in the LGN

FlowCap

Smooth Loops from Unconstrained Video

PCA Flow

Efficient and Scalable Inference

Motion Blur in Layers

Facade Segmentation

Smooth Metric Learning

Robust PCA

3D Recognition

Object Detection

Perceiving Systems

Inferring and exploiting contact

Damonteaser — We interact with the world through touch. DECO [] infers detailed contact on the body (in blue) given a single RGB image.

Humans use touch to interact with each other and the world. While we use our hands and feet to support grasping and locomotion, we also leverage our entire body surface in our daily
interactions with the objects. We great, comfort, and bond with each other though touch. To fully model human behavior, we must be able to capture, generate and understand human-human, human-object, and human-scene contact. To enable this, we have created numerous datasets including ARCTIC, INTERCAP, DAMON, HOT, RICH and Flickr Fits; see the Dataset Research Field for details.

Contact estimation: We have pioneered the inference of 3D contact from images. Using RICH, we trained BSTRO [], which predicts dense body-scene contacts from an RGB image. Our key insight is that regions in contact are always occluded so the network uses a transfomer to explore the whole image for evidence. Using DAMON, we trained DECO [], a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate vertex-level contact on the SMPL body.

Contact and capture: Given a video sequence, MOVER [] estimates both the 3D objects and human movement in the scene such that the object and human are consistent. If a person is touching an object, they need to be in the same place. Likewise, the person and the object cannot occupy the same space. We also use LLMs to extract commonsense knowledge about human-object interaction and use this to reason in 3D about human-object interactions. HOLD [] goes further to jointly reconstruct an interacting hand and object from a monocular video using a compositional articulated implicit model that disentangles the 3D hand and object from 2D images. Contact is also critical for support. With IPMAN [], we formulate intuitive-physics constraints that exploit contact to estimate the center of pressure, center of mass, and base of support and use these to estimate stable body poses from images. We also we exploit contact in estimating dog poses in BARC [][]. With dogs, there is little 3D training data, making pose inference hard. We exploit body-ground contact in estimating dog pose and find that it significantly improves results.

Contact generation: The way individuals position themselves in relation to others, ie.~proxemics, conveys social cues affecting the dynamics of social interaction. Using Flickr Fits, we learn the joint distribution over the 3D poses of two people in close social interaction with a novel denoising diffusion model called BUDDI [].