Hands-Object Interaction

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Intrinsically Motivated Learning

Regularity as Intrinsic Reward for Free Play

SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Learning with Muscles

Natural and Robust Walking from Generic Rewards

The effect of muscles in Learning Behavior

Scaling RL to Large Musculoskeletal Systems

Reinforcement Learning for Diverse Solutions

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Reinforcement Learning and Control

Model-based Reinforcement Learning and Planning

Object-centric Self-supervised Reinforcement Learning

Self-exploration of Behavior

Causal Reasoning in RL

Equation Learner for Extrapolation and Control

Intrinsically Motivated Hierarchical Learner

Regularity as Intrinsic Reward for Free Play

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Natural and Robust Walking from Generic Rewards

Goal-conditioned Offline Planning

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Deep Learning

Combinatorial Optimization as a Layer / Blackbox Differentiation

Object-centric Self-supervised Reinforcement Learning

Symbolic Regression and Equation Learning

Representation Learning

Stepsize adaptation for stochastic optimization

Probabilistic Neural Networks

Learning with 3D rotations: A hitchhiker’s guide to SO(3)

Haptic Sensing

Super-resolution Sensing for Haptics

Insight: a Haptic Sensor Powered by Vision and Machine Learning

Minsight: Learning-based tactile sensing for robotics

ML for Science

Predicting brain activity (fMRI)

Equation Learning for Statistical Physics

Machine Learning for Understanding Quantum Systems

Symbolic Regression and Equation Learning

Previous Research Projects

The Playful Machine

Robust and Affordable Haptic Sensation with Sparse Sensor Configuration

Perzeptive Systeme Members Publications

Hands-Object Interaction

Sab 2016 2021 handsobjects — (Left) We use a dataset of 3D hand scans to learn MANO, a statistical model of 3D hand shape. We combine MANO with our SMPL body model to build the holistic SMPL+H model. We register SMPL+H (pink) to 4D scans (white); the results look natural even for missing data or finger webbing in scans. (Middle) We train ObMan, a deep network with a MANO layer, to estimate 3D hand and object meshes from an RGB image of grasping, while encouraging contact and discouraging penetrations. (Right) We capture GRAB, a dataset of real whole-body grasps (blue, yellow), i.e. of people interacting with objects using their body, hands and face. We use GRAB to train GrabNet, a network that generates grasping hands (gray) for unseen objects (yellow).

Hands allow humans to interact with, and use, physical objects, but capturing hand motion is a challenging computer-vision task. To tackle this, we learn a statistical model of the human hand [], called MANO, that is trained using many 3D scans of human hands and represents the 3D shape variation across a human population. We combine MANO with the SMPL body model and FLAME face model to obtain the expressive SMPL-X model, which allows us to reconstruct realistic bodies and hands using our 4D scanner, mocap data, or monocular video.

MANO can be fit to noisy input data to reconstruct hands and/or objects [] from a monocular RGB-D or multiview RGB sequence. Interacting motion also helps to recover the unknown kinematic skeleton of objects [].

To directly regress hands and objects, we developed ObMan [], a deep-learning model that integrates MANO as a network layer, to estimate the 3D hand and object meshes from an RGB image of grasping. For training data, we use MANO and ShapeNet objects to generate synthetic images of hand-object grasps. ObMan's joint hand-object reconstruction allows the network to encourage contact and discourage interpenetration.

Hand-object distance is central to grasping. To model this, we learn a Grasping Field [] that characterizes every point in a 3D space by the signed distances to the surface of the hand and the object. The hand, the object, and the contact area are represented by implicit surfaces in a common space. The Grasping Field is parameterized with a deep neural network trained on ObMan's synthetic data.

ObMan's dataset contains hand grasps synthesized by robotics software. However, real human grasps look more varied and natural. Moreover, humans use not only their hands, but also use the body and face during interactions. We therefore capture GRAB [], a dataset of real whole-body human grasps of objects. We use a high-end MoCap system, capture 10 subjects interacting with 51 objects, and reconstruct 3D SMPL-X [] human meshes interacting with 3D object meshes, including dynamic poses and in-hand manipulation. We use GRAB to train GrabNet, a deep network that generates 3D hand grasps for unseen 3D objects.

Members

Perzeptive Systeme

Dimitris Tzionas

Guest Scientist

Perzeptive Systeme

Javier Romero

Affiliated Researcher

Guest Scientist

Perzeptive Systeme

Cordelia Schmid

Affiliated Researcher

Perzeptive Systeme

Omid Taheri

Postdoctoral Researcher

Korrawe Karunratanakul

Intern

Research Group Leader

Perzeptive Systeme

Siyu Tang

Guest Scientist

Publications

Perceiving Systems Conference Paper GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Taheri, O., Choutas, V., Black, M. J., Tzionas, D. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), :13253-13263, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022 (Published) arXiv Sup. Mat Project YouTube Code Models DOI URL BibTeX

Perceiving Systems Conference Paper Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation Fan, Z., Spurr, A., Kocabas, M., Tang, S., Black, M. J., Hilliges, O. 2021 International Conference on 3D Vision (3DV 2021), :1-10, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2021), December 2021 (Published) arXiv project code video DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Grasping Field: Learning Implicit Representations for Human Grasps Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S. In 2020 International Conference on 3D Vision (3DV 2020), :333-344, IEEE, Piscataway, NJ, International Conference on 3D Vision (3DV 2020), November 2020 (Published) pdf arXiv code DOI BibTeX

Perceiving Systems Conference Paper GRAB: A Dataset of Whole-Body Human Grasping of Objects Taheri, O., Ghorbani, N., Black, M. J., Tzionas, D. In Computer Vision – ECCV 2020, 4:581-600, Lecture Notes in Computer Science, 12349, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) pdf suppl video (long) video (short) DOI URL BibTeX

Perceiving Systems Conference Paper Learning Joint Reconstruction of Hands and Manipulated Objects Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () pdf suppl poster DOI URL BibTeX

Perceiving Systems Article Embodied Hands: Modeling and Capturing Hands and Bodies Together Romero, J., Tzionas, D., Black, M. J. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 () website youtube paper suppl video DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Capturing Hand-Object Interaction and Reconstruction of Manipulated Objects Tzionas, D. University of Bonn, 2017 () Thesis URL BibTeX

Perceiving Systems Article Capturing Hands in Action using Discriminative Salient Points and Physics Simulation Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J. International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (Published) Website pdf DOI URL BibTeX

Perceiving Systems Conference Paper Reconstructing Articulated Rigged Models from RGB-D Videos Tzionas, D., Gall, J. In European Conference on Computer Vision Workshops 2016 (ECCVW’16) - Workshop on Recovering 6D Object Pose (R6D’16), :620-633, Springer International Publishing, 2016 () pdf suppl Project's Website YouTube DOI URL BibTeX

Perceiving Systems Article The GRASP Taxonomy of Human Grasp Types Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D. Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 () publisher website pdf DOI BibTeX

Perceiving Systems Conference Paper 3D Object Reconstruction from Hand-Object Interactions Tzionas, D., Gall, J. In International Conference on Computer Vision (ICCV), :729-737, International Conference on Computer Vision (ICCV), December 2015 () pdf Project's Website Video Spotlight Extended Abstract YouTube DOI BibTeX

Perceiving Systems Conference Paper Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points Tzionas, D., Srikantha, A., Aponte, P., Gall, J. In German Conference on Pattern Recognition (GCPR), :1-13, Lecture Notes in Computer Science, Springer, GCPR, September 2014 () pdf Supplementary pdf Supplementary Material Project Page DOI BibTeX

Perceiving Systems Conference Paper A Comparison of Directional Distances for Hand Pose Estimation Tzionas, D., Gall, J. In German Conference on Pattern Recognition (GCPR), 8142:131-141, Lecture Notes in Computer Science, (Editors: Weickert, Joachim and Hein, Matthias and Schiele, Bernt), Springer, 2013 () pdf Supplementary Project Page DOI URL BibTeX

Perceiving Systems Conference Paper Motion Capture of Hands in Action using Discriminative Salient Points Ballan, L., Taneja, A., Gall, J., van Gool, L., Pollefeys, M. In European Conference on Computer Vision (ECCV), 7577:640-653, LNCS, Springer, 2012 () data video pdf supplementary BibTeX