Language and Movement

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Intrinsically Motivated Learning

Regularity as Intrinsic Reward for Free Play

SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Learning with Muscles

Natural and Robust Walking from Generic Rewards

The effect of muscles in Learning Behavior

Scaling RL to Large Musculoskeletal Systems

Reinforcement Learning for Diverse Solutions

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Reinforcement Learning and Control

Model-based Reinforcement Learning and Planning

Object-centric Self-supervised Reinforcement Learning

Self-exploration of Behavior

Causal Reasoning in RL

Equation Learner for Extrapolation and Control

Intrinsically Motivated Hierarchical Learner

Regularity as Intrinsic Reward for Free Play

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Natural and Robust Walking from Generic Rewards

Goal-conditioned Offline Planning

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Deep Learning

Combinatorial Optimization as a Layer / Blackbox Differentiation

Object-centric Self-supervised Reinforcement Learning

Symbolic Regression and Equation Learning

Representation Learning

Stepsize adaptation for stochastic optimization

Probabilistic Neural Networks

Learning with 3D rotations: A hitchhiker’s guide to SO(3)

Haptic Sensing

Super-resolution Sensing for Haptics

Insight: a Haptic Sensor Powered by Vision and Machine Learning

Minsight: Learning-based tactile sensing for robotics

ML for Science

Predicting brain activity (fMRI)

Equation Learning for Statistical Physics

Machine Learning for Understanding Quantum Systems

Symbolic Regression and Equation Learning

Previous Research Projects

The Playful Machine

Robust and Affordable Haptic Sensation with Sparse Sensor Configuration

Perzeptive Systeme Members Publications

Language and Movement

Lang and movement cropped teaser — Top: Our goal is to generate 3D human movements that are grounded in actions using the BABEL dataset [], which consists of dense frame-level action labels that correspond to 3D human movements. Bottom: We identify individual actors in a movie clip and synthesize natural language descriptions of their actions and interactions [].

Understanding human behavior requires more than 3D pose. It requires capturing the semantics of human movement — what a person is doing, how they’re doing it, and why. The what and why of human movement — the actions of a person, their goals, emotions, and mental states — are typically described via natural language. Thus, grounding human movement in language, is a key to modeling and synthesizing human behavior.

Progress in this requires 3D movement data that is precisely aligned with action descriptions. In BABEL [] we label (>250) actions performed in (>43 hours) mocap sequences from AMASS []. Fine-grained "frame labels" precisely capture the duration of each action in a sequence. BABEL is being leveraged for tasks like action recognition, temporal action localization, and motion synthesis.

Since 3D mocap data will always be limited, we would like to learn language-grounded movement from video. Our approach identifies individual actors in a movie clip and synthesizes language descriptions of their actions and interactions []. The approach first localizes characters by relating their visual appearance to mentions in the movie scripts via a semi-supervised approach. This (noisy) supervision greatly improves the performance of a description model.

ACTOR [] is an example of our work on synthesizing human movement, conditioned on action labels. Despite being trained with noisy data estimated from monocular video, ACTOR's transformer VAE architecture learns to synthesize diverse and realistic movements of varied length.

Members

Perzeptive Systeme, Software Workshop

Abhinanda Ranjit Punnakkal

Guest Scientist

Perzeptive Systeme

Nikos Athanasiou

Guest Scientist

Perzeptive Systeme

Maria Alejandra Quiros-Ramirez

Guest Scientist

Perzeptive Systeme

Arjun Chandrasekaran

Guest Scientist

Doctoral Researcher

Perzeptive Systeme

Gul Varol

Guest Scientist

Perzeptive Systeme

Michael Black

Director

Publications

Perceiving Systems Conference Paper MotionFix: Text-Driven 3D Human Motion Editing Athanasiou, N., Cseke, A., Diomataris, M., Black, M. J., Varol, G. In SIGGRAPH Asia 2024 Conference Proceedings, ACM, SIGGRAPH Asia , December 2024 (Published) Code (GitHub) Website Data Exploration ArXiv URL BibTeX

Perceiving Systems Conference Paper SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation Athanasiou, N., Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), :9984-9995, International Conference on Computer Vision, October 2023 (Published) website code paper-arxiv video BibTeX

Perceiving Systems Conference Paper TEACH: Temporal Action Composition for 3D Humans Athanasiou, N., Petrovich, M., Black, M. J., Varol, G. In 2022 International Conference on 3D Vision (3DV), :414-423, 3DV'22, September 2022 (Published) code arXiv website video camera-ready DOI URL BibTeX

Perceiving Systems Conference Paper Action-Conditioned 3D Human Motion Synthesis with Transformer VAE Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), :10965-10975, IEEE, Piscataway, NJ, International Conference on Computer Vision, October 2021 (Published) website code paper-arxiv video DOI BibTeX

Perceiving Systems Conference Paper BABEL: Bodies, Action and Behavior with English Labels Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., Quiros-Ramirez, M. A., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :722-731, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021) , June 2021 (Published) dataset poster pdf sup mat video code DOI BibTeX

Perceiving Systems Conference Paper Generating Descriptions with Grounded and Co-Referenced People Rohrbach, A., Rohrbach, M., Tang, S., Oh, S. J., Schiele, B. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), :4196-4206, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 () PDF DOI BibTeX