Towards Human-Centric Foundation Models: Pretraining Datasets and Unified Architectures

Recent years have witnessed great research interests in Human-Centric Visual Computing, such as person re-identification in social surveillance, mesh recovery in Metaverse, and pedestrian detection in autonomous driving. The recent development of large model offers the opportunity to unify these human-centric tasks and achieve improved performance by merging public datasets from different tasks. This talk will present our recent work on developing human-centric unified models on 2D vision, 3D vision, Skelton-based and vision-language tasks. We hope our model will be integrated to the current large language models to achieve an intelligent human world model.
Speaker Biography
Shixiang Tang (Chinese University of Hong Kong)
postdoctoral researcher
Shixiang Tang is a postdoctoral researcher at the MMlab Lab in Chinese University of Hong Kong with Professor Wanli Ouyang. Previously, he received her Ph.D. Degree from University of Sydney, Australia, under the supervision of Professor Wanli Ouyang. His research interests revolve around the self-supervised learning and human-centric foundation models.