Hands-Object Interaction

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Learning Control

Learning Coupling Terms of Movement Primitives

Incremental Local Regression

Perception for Action

Autonomous Robotic Manipulation

Modeling Top-Down Saliency for Visual Object Search

Interactive Perception

State Estimation and Sensor Fusion for the Control of Legged Robots

Probabilistic Object and Manipulator Tracking

Global Object Shape Reconstruction by Fusing Visual and Tactile Data

Robot Arm Pose Estimation as a Learning Problem

Learning to Grasp from Big Data

Gaussian Filtering as Variational Inference

Template-Based Learning of Model Free Grasping

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Motion planning and control

Autonomous Robotic Manipulation

Learning Coupling Terms of Movement Primitives

State Estimation and Sensor Fusion for the Control of Legged Robots

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Movement Representation for Reactive Behavior

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Neural Control of Movement

Experimental Robotics

Autonomous Robotic Manipulation

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Other

Perzeptive Systeme Members Publications

Hands-Object Interaction

Sab 2016 2021 handsobjects — (Left) We use a dataset of 3D hand scans to learn MANO, a statistical model of 3D hand shape. We combine MANO with our SMPL body model to build the holistic SMPL+H model. We register SMPL+H (pink) to 4D scans (white); the results look natural even for missing data or finger webbing in scans. (Middle) We train ObMan, a deep network with a MANO layer, to estimate 3D hand and object meshes from an RGB image of grasping, while encouraging contact and discouraging penetrations. (Right) We capture GRAB, a dataset of real whole-body grasps (blue, yellow), i.e. of people interacting with objects using their body, hands and face. We use GRAB to train GrabNet, a network that generates grasping hands (gray) for unseen objects (yellow).

Hands allow humans to interact with, and use, physical objects, but capturing hand motion is a challenging computer-vision task. To tackle this, we learn a statistical model of the human hand [], called MANO, that is trained using many 3D scans of human hands and represents the 3D shape variation across a human population. We combine MANO with the SMPL body model and FLAME face model to obtain the expressive SMPL-X model, which allows us to reconstruct realistic bodies and hands using our 4D scanner, mocap data, or monocular video.

MANO can be fit to noisy input data to reconstruct hands and/or objects [] from a monocular RGB-D or multiview RGB sequence. Interacting motion also helps to recover the unknown kinematic skeleton of objects [].

To directly regress hands and objects, we developed ObMan [], a deep-learning model that integrates MANO as a network layer, to estimate the 3D hand and object meshes from an RGB image of grasping. For training data, we use MANO and ShapeNet objects to generate synthetic images of hand-object grasps. ObMan's joint hand-object reconstruction allows the network to encourage contact and discourage interpenetration.

Hand-object distance is central to grasping. To model this, we learn a Grasping Field [] that characterizes every point in a 3D space by the signed distances to the surface of the hand and the object. The hand, the object, and the contact area are represented by implicit surfaces in a common space. The Grasping Field is parameterized with a deep neural network trained on ObMan's synthetic data.

ObMan's dataset contains hand grasps synthesized by robotics software. However, real human grasps look more varied and natural. Moreover, humans use not only their hands, but also use the body and face during interactions. We therefore capture GRAB [], a dataset of real whole-body human grasps of objects. We use a high-end MoCap system, capture 10 subjects interacting with 51 objects, and reconstruct 3D SMPL-X [] human meshes interacting with 3D object meshes, including dynamic poses and in-hand manipulation. We use GRAB to train GrabNet, a deep network that generates 3D hand grasps for unseen 3D objects.

Members

Perzeptive Systeme

Dimitris Tzionas

Guest Scientist

Perzeptive Systeme

Javier Romero

Affiliated Researcher

Guest Scientist

Perzeptive Systeme

Cordelia Schmid

Affiliated Researcher

Perzeptive Systeme

Omid Taheri

Postdoctoral Researcher

Korrawe Karunratanakul

Intern

Research Group Leader

Perzeptive Systeme

Siyu Tang

Guest Scientist

Publications

Perceiving Systems Conference Paper GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Taheri, O., Choutas, V., Black, M. J., Tzionas, D. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :13253-13263, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) arXiv Sup. Mat Project YouTube Code Models DOI URL BibTeX

Perceiving Systems Conference Paper Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation Fan, Z., Spurr, A., Kocabas, M., Tang, S., Black, M. J., Hilliges, O. 2021 International Conference on 3D Vision (3DV 2021), :1-10, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2021), December 2021 (Published) arXiv project code video DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Grasping Field: Learning Implicit Representations for Human Grasps Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S. In 2020 International Conference on 3D Vision (3DV 2020), :333-344, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2020), November 2020 (Published) pdf arXiv code DOI BibTeX

Perceiving Systems Conference Paper GRAB: A Dataset of Whole-Body Human Grasping of Objects Taheri, O., Ghorbani, N., Black, M. J., Tzionas, D. In Computer Vision – ECCV 2020, 4:581-600, Lecture Notes in Computer Science, 12349, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) pdf suppl video (long) video (short) DOI URL BibTeX

Perceiving Systems Conference Paper Learning Joint Reconstruction of Hands and Manipulated Objects Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () pdf suppl poster DOI URL BibTeX

Perceiving Systems Article Embodied Hands: Modeling and Capturing Hands and Bodies Together Romero, J., Tzionas, D., Black, M. J. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 () website youtube paper suppl video DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Capturing Hand-Object Interaction and Reconstruction of Manipulated Objects Tzionas, D. University of Bonn, 2017 () Thesis URL BibTeX

Perceiving Systems Article Capturing Hands in Action using Discriminative Salient Points and Physics Simulation Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J. International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (Published) Website pdf DOI URL BibTeX

Perceiving Systems Conference Paper Reconstructing Articulated Rigged Models from RGB-D Videos Tzionas, D., Gall, J. In European Conference on Computer Vision Workshops 2016 (ECCVW’16) - Workshop on Recovering 6D Object Pose (R6D’16), :620-633, Springer International Publishing, 2016 () pdf suppl Project's Website YouTube DOI URL BibTeX

Perceiving Systems Article The GRASP Taxonomy of Human Grasp Types Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D. Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 () publisher website pdf DOI BibTeX

Perceiving Systems Conference Paper 3D Object Reconstruction from Hand-Object Interactions Tzionas, D., Gall, J. In International Conference on Computer Vision (ICCV), :729-737, International Conference on Computer Vision (ICCV), December 2015 () pdf Project's Website Video Spotlight Extended Abstract YouTube DOI BibTeX

Perceiving Systems Conference Paper Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points Tzionas, D., Srikantha, A., Aponte, P., Gall, J. In German Conference on Pattern Recognition (GCPR), :1-13, Lecture Notes in Computer Science, Springer, GCPR, September 2014 () pdf Supplementary pdf Supplementary Material Project Page DOI BibTeX

Perceiving Systems Conference Paper A Comparison of Directional Distances for Hand Pose Estimation Tzionas, D., Gall, J. In German Conference on Pattern Recognition (GCPR), 8142:131-141, Lecture Notes in Computer Science, (Editors: Weickert, Joachim and Hein, Matthias and Schiele, Bernt), Springer, 2013 () pdf Supplementary Project Page DOI URL BibTeX

Perceiving Systems Conference Paper Motion Capture of Hands in Action using Discriminative Salient Points Ballan, L., Taneja, A., Gall, J., van Gool, L., Pollefeys, M. In European Conference on Computer Vision (ECCV), 7577:640-653, LNCS, Springer, 2012 () data video pdf supplementary BibTeX