Perzeptive Systeme Conference Paper 2017

On human motion prediction using recurrent neural networks

Martinez

Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.

Author(s): Julieta Martinez and Michael J. Black and Javier Romero
Book Title: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Pages: 4674-4683
Year: 2017
Month: July
Day: 21-26
Publisher: IEEE
Project(s):
Bibtex Type: Conference Paper (inproceedings)
Address: Piscataway, NJ, USA
Event Name: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Event Place: Honolulu, HI, USA
Electronic Archiving: grant_archive
ISBN: 978-1-5386-0457-1
ISSN: 1063-6919
Links:

BibTex

@inproceedings{Martinez:CVPR:2017,
  title = {On human motion prediction using recurrent neural networks},
  booktitle = {Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017},
  abstract = {Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.},
  pages = {4674-4683},
  publisher = {IEEE},
  address = {Piscataway, NJ, USA},
  month = jul,
  year = {2017},
  slug = {martinez-cvpr-2017},
  author = {Martinez, Julieta and Black, Michael J. and Romero, Javier},
  month_numeric = {7}
}