Inferring Actions

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Learning Control

Learning Coupling Terms of Movement Primitives

Incremental Local Regression

Perception for Action

Autonomous Robotic Manipulation

Modeling Top-Down Saliency for Visual Object Search

Interactive Perception

State Estimation and Sensor Fusion for the Control of Legged Robots

Probabilistic Object and Manipulator Tracking

Global Object Shape Reconstruction by Fusing Visual and Tactile Data

Robot Arm Pose Estimation as a Learning Problem

Learning to Grasp from Big Data

Gaussian Filtering as Variational Inference

Template-Based Learning of Model Free Grasping

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Motion planning and control

Autonomous Robotic Manipulation

Learning Coupling Terms of Movement Primitives

State Estimation and Sensor Fusion for the Control of Legged Robots

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Movement Representation for Reactive Behavior

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Neural Control of Movement

Experimental Robotics

Autonomous Robotic Manipulation

Inverse Optimal Control

Motion Optimization

Optimal Control for Legged Robots

Associative Skill Memories

Real-Time Perception meets Reactive Motion Generation

Other

Perzeptive Systeme Members Publications

Inferring Actions

Human behavior can be described at multiple levels. At the lowest level, we observe the 3D pose of the body over time. Poses can be organized into primitives that capture coordinated activity of different body parts. These further form more complex actions. At the most abstract level, behavior can be described semantically in terms of actions and goals.

The BABEL dataset [] contains labels of actions being performed by subjects in mocap sequences from AMASS []. BABEL is larger and more complex than existing 3D action recognition datasets, making the action recognition task challenging. BABEL has a long-tailed action distribution, significant intra-class variance, and frequently, multiple actions are performed simultaneously. These characteristics are similar to real-world data, suggesting that BABEL can drive progress in the field.

Human movements typically involve different successive actions. In addition to asking what actions are occurring, Temporal Action Localization (TAL) asks when these actions occur; i.e., the start and end of each action in the video.

Prior methods addressing TAL lose important information while aggregating features across successive frames. We develop a novel, learnable bilinear pooling operation to aggregate features that retains fine-grained temporal information []. Experiments demonstrate superior performance to prior work on various datasets.

Humans can readily differentiate biological motion from non-biological motion without training, even with sparse visual cues like moving dots. In this spirit, we perform behavior analysis at a low-level using a novel dynamic clustering algorithm []. Low-level visual cues are aggregated to high-level action patterns, and are utilized for the TAL task.

Members

Research Group Leader

Perzeptive Systeme, Software Workshop

Abhinanda Ranjit Punnakkal

Guest Scientist

Perzeptive Systeme

Arjun Chandrasekaran

Guest Scientist

Perzeptive Systeme

Nikos Athanasiou

Guest Scientist

Perzeptive Systeme

Maria Alejandra Quiros-Ramirez

Guest Scientist

Publications

Perceiving Systems Conference Paper BABEL: Bodies, Action and Behavior with English Labels Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., Quiros-Ramirez, M. A., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :722-731, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021) , June 2021 (Published) dataset poster pdf sup mat video code DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Local Temporal Bilinear Pooling for Fine-grained Action Parsing Zhang, Y., Tang, S., Muandet, K., Jarvers, C., Neumann, H. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), :12005-12015, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () Code video demo pdf URL BibTeX

Perceiving Systems Article Temporal Human Action Segmentation via Dynamic Clustering Zhang, Y., Sun, H., Tang, S., Neumann, H. arXiv preprint arXiv:1803.05790, 2018 () URL BibTeX