Back

Perceiving Systems Publications Website

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

Beat2teaser
The BEAT2 dataset [File Icon] provides 60 hours of high-quality motion capture data in SMPL-X format together with audio of speakers and a wide range of emotions. BEAT2 enables training of models that infer full-body face and gestures from audio.

Publications

Perceiving Systems Conference Paper EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling Liu, H., Zhu, Z., Becherini, G., Peng, Y., Su, M., Zhou, Y., Zhe, X., Iwamoto, N., Zheng, B., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), :1144-1154, CVPR, June 2024 (Published) arXiv project dataset code gradio colab video DOI URL BibTeX