Hands-Object Interaction

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Inferring and exploiting contact

Generative Proxemics: A Prior for 3D Social Interaction from Images

BITE -- Dog Shape and Pose from an Image

HOLD -- inferring 3D hand and object shape from video

MOVER -- Reconstructing 3D Scenes and People using Interaction

Datasets for understanding humans and animals

The Poses for Equine Research Dataset (PFERD)

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

The BioAMASS Dataset

OpenCapBench dataset

Human health and the 3D body

Body Shape Models in Treating Anorexia Nervosa

Customized Bone Plants for Humerus Shaft Fractures

Reconstructing Signing Avatars From Video Using Linguistic Priors

The AI animator

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Gaussian Garments

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

Language, Vision, and World Models

AWOL: Analysis WithOut synthesis using Language

Re-Thinking Inverse Graphics with Large Language Models

TeCH: Text-guided Reconstruction of Clothed Humans

Human pose, shape, and motion capture

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

3D Human Pose Estimation via Intuitive Physics

Accurate 3D Body Shape Regression using Metric and Semantic Attributes

BEV

Generating human motion

Generating Human Interaction Motions in Scenes with Text Control

TEMOS: Generating Diverse Human Motions from Text

EMAGE: Full-body Gestures from Audio

TEACH: Temporal Action Compositions for 3D Humans

Robot Perception Group

AirCap: 3D Motion Capture

AirCap: Perception-Based Control

AirCapRL: Aerial Motion Capture Using Deep RL

Data Team

Lab Tours and Public Outreach

Collecting Data - From the Idea to the Publication

Capture Technologies Setup

Completed Projects

Human Pose, Shape and Action

3D Pose from Images

2D Pose from Images

Beyond Motion Capture

Action and Behavior

Body Perception

Body Applications

Pose and Motion Priors

Clothing Models (2011-2015)

Reflectance Filtering

Learning on Manifolds

Markerless Animal Motion Capture

Multi-Camera Capture

2D Pose from Optical Flow

Body Perception

Neural Prosthetics and Decoding

Part-based Body Models

Intrinsic Depth

Lie Bodies

Layers, Time and Segmentation

Understanding Action Recognition (JHMDB)

Intrinsic Video

Intrinsic Images

Action Recognition with Tracking

Neural Control of Grasping

Flowing Puppets

Faces

Deformable Structures

Model-based Anthropometry

Modeling 3D Human Breathing

Optical flow in the LGN

FlowCap

Smooth Loops from Unconstrained Video

PCA Flow

Efficient and Scalable Inference

Motion Blur in Layers

Facade Segmentation

Smooth Metric Learning

Robust PCA

3D Recognition

Object Detection

Perzeptive Systeme Members Publications

Hands-Object Interaction

Sab 2016 2021 handsobjects — (Left) We use a dataset of 3D hand scans to learn MANO, a statistical model of 3D hand shape. We combine MANO with our SMPL body model to build the holistic SMPL+H model. We register SMPL+H (pink) to 4D scans (white); the results look natural even for missing data or finger webbing in scans. (Middle) We train ObMan, a deep network with a MANO layer, to estimate 3D hand and object meshes from an RGB image of grasping, while encouraging contact and discouraging penetrations. (Right) We capture GRAB, a dataset of real whole-body grasps (blue, yellow), i.e. of people interacting with objects using their body, hands and face. We use GRAB to train GrabNet, a network that generates grasping hands (gray) for unseen objects (yellow).

Hands allow humans to interact with, and use, physical objects, but capturing hand motion is a challenging computer-vision task. To tackle this, we learn a statistical model of the human hand [], called MANO, that is trained using many 3D scans of human hands and represents the 3D shape variation across a human population. We combine MANO with the SMPL body model and FLAME face model to obtain the expressive SMPL-X model, which allows us to reconstruct realistic bodies and hands using our 4D scanner, mocap data, or monocular video.

MANO can be fit to noisy input data to reconstruct hands and/or objects [] from a monocular RGB-D or multiview RGB sequence. Interacting motion also helps to recover the unknown kinematic skeleton of objects [].

To directly regress hands and objects, we developed ObMan [], a deep-learning model that integrates MANO as a network layer, to estimate the 3D hand and object meshes from an RGB image of grasping. For training data, we use MANO and ShapeNet objects to generate synthetic images of hand-object grasps. ObMan's joint hand-object reconstruction allows the network to encourage contact and discourage interpenetration.

Hand-object distance is central to grasping. To model this, we learn a Grasping Field [] that characterizes every point in a 3D space by the signed distances to the surface of the hand and the object. The hand, the object, and the contact area are represented by implicit surfaces in a common space. The Grasping Field is parameterized with a deep neural network trained on ObMan's synthetic data.

ObMan's dataset contains hand grasps synthesized by robotics software. However, real human grasps look more varied and natural. Moreover, humans use not only their hands, but also use the body and face during interactions. We therefore capture GRAB [], a dataset of real whole-body human grasps of objects. We use a high-end MoCap system, capture 10 subjects interacting with 51 objects, and reconstruct 3D SMPL-X [] human meshes interacting with 3D object meshes, including dynamic poses and in-hand manipulation. We use GRAB to train GrabNet, a deep network that generates 3D hand grasps for unseen 3D objects.

Members

Perzeptive Systeme

Dimitris Tzionas

Guest Scientist

Perzeptive Systeme

Javier Romero

Affiliated Researcher

Guest Scientist

Perzeptive Systeme

Cordelia Schmid

Affiliated Researcher

Perzeptive Systeme

Omid Taheri

Postdoctoral Researcher

Korrawe Karunratanakul

Intern

Research Group Leader

Perzeptive Systeme

Siyu Tang

Guest Scientist

Publications

Perceiving Systems Conference Paper GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Taheri, O., Choutas, V., Black, M. J., Tzionas, D. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :13253-13263, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) arXiv Sup. Mat Project YouTube Code Models DOI URL BibTeX

Perceiving Systems Conference Paper Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation Fan, Z., Spurr, A., Kocabas, M., Tang, S., Black, M. J., Hilliges, O. 2021 International Conference on 3D Vision (3DV 2021), :1-10, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2021), December 2021 (Published) arXiv project code video DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Grasping Field: Learning Implicit Representations for Human Grasps Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S. In 2020 International Conference on 3D Vision (3DV 2020), :333-344, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2020), November 2020 (Published) pdf arXiv code DOI BibTeX

Perceiving Systems Conference Paper GRAB: A Dataset of Whole-Body Human Grasping of Objects Taheri, O., Ghorbani, N., Black, M. J., Tzionas, D. In Computer Vision – ECCV 2020, 4:581-600, Lecture Notes in Computer Science, 12349, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) pdf suppl video (long) video (short) DOI URL BibTeX

Perceiving Systems Conference Paper Learning Joint Reconstruction of Hands and Manipulated Objects Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () pdf suppl poster DOI URL BibTeX

Perceiving Systems Article Embodied Hands: Modeling and Capturing Hands and Bodies Together Romero, J., Tzionas, D., Black, M. J. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 () website youtube paper suppl video DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Capturing Hand-Object Interaction and Reconstruction of Manipulated Objects Tzionas, D. University of Bonn, 2017 () Thesis URL BibTeX

Perceiving Systems Article Capturing Hands in Action using Discriminative Salient Points and Physics Simulation Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J. International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (Published) Website pdf DOI URL BibTeX

Perceiving Systems Conference Paper Reconstructing Articulated Rigged Models from RGB-D Videos Tzionas, D., Gall, J. In European Conference on Computer Vision Workshops 2016 (ECCVW’16) - Workshop on Recovering 6D Object Pose (R6D’16), :620-633, Springer International Publishing, 2016 () pdf suppl Project's Website YouTube DOI URL BibTeX

Perceiving Systems Article The GRASP Taxonomy of Human Grasp Types Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D. Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 () publisher website pdf DOI BibTeX

Perceiving Systems Conference Paper 3D Object Reconstruction from Hand-Object Interactions Tzionas, D., Gall, J. In International Conference on Computer Vision (ICCV), :729-737, International Conference on Computer Vision (ICCV), December 2015 () pdf Project's Website Video Spotlight Extended Abstract YouTube DOI BibTeX

Perceiving Systems Conference Paper Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points Tzionas, D., Srikantha, A., Aponte, P., Gall, J. In German Conference on Pattern Recognition (GCPR), :1-13, Lecture Notes in Computer Science, Springer, GCPR, September 2014 () pdf Supplementary pdf Supplementary Material Project Page DOI BibTeX

Perceiving Systems Conference Paper A Comparison of Directional Distances for Hand Pose Estimation Tzionas, D., Gall, J. In German Conference on Pattern Recognition (GCPR), 8142:131-141, Lecture Notes in Computer Science, (Editors: Weickert, Joachim and Hein, Matthias and Schiele, Bernt), Springer, 2013 () pdf Supplementary Project Page DOI URL BibTeX

Perceiving Systems Conference Paper Motion Capture of Hands in Action using Discriminative Salient Points Ballan, L., Taneja, A., Gall, J., van Gool, L., Pollefeys, M. In European Conference on Computer Vision (ECCV), 7577:640-653, LNCS, Springer, 2012 () data video pdf supplementary BibTeX