Back
We propose DeepMultiCap, a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models. To tackle the serious occlusion challenge for close interacting scenes, we combine a recently proposed pixel-aligned implicit function with a parametric model for robust reconstruction of the invisible surface areas. An effective attention-aware module is designed to obtain the fine-grained geometry details from multi-view images, where high-fidelity results can be generated. In addition to the spatial attention method, for video inputs, we further propose a novel temporal fusion method to alleviate the noise and temporal inconsistencies for moving character reconstruction. For quantitative evaluation, we contribute a high quality multi-person dataset, MultiHuman, which consists of 150 static scenes with different levels of occlusions and ground truth 3D human models. Experimental results demonstrate the state-of-the-art performance of our method and the well generalization to real multiview video data, which outperforms the prior works by a large margin. Multi-person total motion capture is extremely challenging when it comes to handle severe occlusions, different reconstruction granularities from body to face and hands, drastically changing observation scales and fast body movements. To overcome these challenges above, we contribute a lightweight total motion capture system for multi-person interactive scenarios using only sparse multi-view cameras. By contributing a novel hand and face bootstrapping algorithm, our method is capable of efficient localization and accurate association of the hands and faces even on severe occluded occasions. We leverage both pose regression and keypoints detection methods and further propose a unified two-stage parametric fitting method for achieving pixel-aligned accuracy. Moreover, for extremely self-occluded poses and close interactions, a novel feedback mechanism is proposed to propagate the pixel-aligned reconstructions into the next frame for more accurate association. Overall, we propose the first light-weight total capture system and achieves fast, robust and accurate multi-person total motion capture performance. The results and experiments show that our method achieves more accurate results than existing methods under sparse-view setups.
Yuxiang Zhang and Yang Zheng (Tsinghua University)
Yuxiang Zhang is a 3rd year Ph.D student at the Department of Automation, THU, advised by Yebin Liu in BBNC lab. His research focuses on computer vision and graphics, specifically on multi-person markless motion capture and 3D human reconstruction. Yang Zheng is a senior undergraduate student from Tsinghua University. He has been doing research at Broadband Network & Digital Media Lab with Professor Yebin Liu in the field of computer vision. He is expected to graduate as a Bachelor of Engineering from the Department of Automation in July 2022, and a Bachelor of Psychology as my second degree.