The AI animator

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Inferring and exploiting contact

Generative Proxemics: A Prior for 3D Social Interaction from Images

BITE -- Dog Shape and Pose from an Image

HOLD -- inferring 3D hand and object shape from video

MOVER -- Reconstructing 3D Scenes and People using Interaction

Datasets for understanding humans and animals

The Poses for Equine Research Dataset (PFERD)

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

The BioAMASS Dataset

OpenCapBench dataset

Human health and the 3D body

Body Shape Models in Treating Anorexia Nervosa

Customized Bone Plants for Humerus Shaft Fractures

Reconstructing Signing Avatars From Video Using Linguistic Priors

The AI animator

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Gaussian Garments

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

Language, Vision, and World Models

AWOL: Analysis WithOut synthesis using Language

Re-Thinking Inverse Graphics with Large Language Models

TeCH: Text-guided Reconstruction of Clothed Humans

Human pose, shape, and motion capture

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

3D Human Pose Estimation via Intuitive Physics

Accurate 3D Body Shape Regression using Metric and Semantic Attributes

BEV

Generating human motion

Generating Human Interaction Motions in Scenes with Text Control

TEMOS: Generating Diverse Human Motions from Text

EMAGE: Full-body Gestures from Audio

TEACH: Temporal Action Compositions for 3D Humans

Robot Perception Group

AirCap: 3D Motion Capture

AirCap: Perception-Based Control

AirCapRL: Aerial Motion Capture Using Deep RL

Data Team

Lab Tours and Public Outreach

Collecting Data - From the Idea to the Publication

Capture Technologies Setup

Completed Projects

Human Pose, Shape and Action

3D Pose from Images

2D Pose from Images

Beyond Motion Capture

Action and Behavior

Body Perception

Body Applications

Pose and Motion Priors

Clothing Models (2011-2015)

Reflectance Filtering

Learning on Manifolds

Markerless Animal Motion Capture

Multi-Camera Capture

2D Pose from Optical Flow

Body Perception

Neural Prosthetics and Decoding

Part-based Body Models

Intrinsic Depth

Lie Bodies

Layers, Time and Segmentation

Understanding Action Recognition (JHMDB)

Intrinsic Video

Intrinsic Images

Action Recognition with Tracking

Neural Control of Grasping

Flowing Puppets

Faces

Deformable Structures

Model-based Anthropometry

Modeling 3D Human Breathing

Optical flow in the LGN

FlowCap

Smooth Loops from Unconstrained Video

PCA Flow

Efficient and Scalable Inference

Motion Blur in Layers

Facade Segmentation

Smooth Metric Learning

Robust PCA

3D Recognition

Object Detection

Perceiving Systems

The AI animator

Nytimesicon — Rather than replacing artists, AI can help them by accelerating their workflows. To that end, we focus on capture and generation technologies that output traditional graphics representations, such as meshes. The New York Times used our ICON [] technology to recreate soccer players from video of the World Cup and football players from the Super Bowl.

Generative AI is evolving rapidly and many argue that GenAI will fully replace traditional graphics. There is nothing really wrong, however, with traditional graphics except that it requires extensive experience and time to create realistic images and video. So, instead, can we use GenAI to make traditional graphics methods more accessible?

Hair: Traditional hair models are based on strands and creating hairstyles with strands requires experience. To automate this, MonoHair [] performs high-fidelity hair reconstruction of strands from a monocular video, while Gaussian Haircut [] exploits Gaussian splatting to reconstruct hairstyles from video in the form of strand-aligned 3D Gaussians. With HAAR [], we train the first text-conditioned diffusion model that outputs hairstyles using guide strands represented in a latent space. We upsample these strands and the result can be rendered using off-the-shelf computer graphics techniques.

Clothing: Designing, draping, and animating clothing is time consuming. HOOD [] leverages graph neural networks (GNN), multi-level message passing, and unsupervised training to enable real-time prediction of realistic clothing dynamics. By modeling clothing as a GNN we maintain all the benefits of meshes while enabling changes in topology due to things like buttons and zippers. ContourCraft [] goes further to simulate complex multi-layer outfits. Using a novel Intersection Contour loss term, it can both prevent and resolve intersections in neural cloth simulations.

Head avatars: We have explored methods for estimating animatable 3D head avatars using implicit functions (IMavatar) [], point clouds (PointAvatar) [], meshes (FLARE) [], and a mix of meshes and radiance fields (TECA [], INSTA []). We have also worked to reduce racial bias in skin texture estimates under varied lighting (TRUST) [].

Full-body avatars: Our goal is to make avatar creation so simple that it can be done from a single photo, a text description, or even a personal photo collection. We have led the field over the reporting period with papers that explore numerous ways of creating avatars (generative and regression) and numerous representations (implicit, points, meshes, and hybrids). ICON [], ECON [], SCARF [], TeCH [], TADA! [], AG3D [], gDNA [], Fast-SNARF [], SCULPT [] and PuzzleAvatar [] are just some of the methods we developed. All these methods combine the animation benefits of SMPL-style models with richer shape and appearance representations.