TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning

We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively.
Author(s): | Ziani, Andrea and Fan, Zicong and Kocabas, Muhammed and Christen, Sammy and Hilliges, Otmar |
Book Title: | 2022 International Conference on 3D Vision (3DV 2022) |
Pages: | 627--636 |
Year: | 2022 |
Month: | September |
Publisher: | IEEE |
Bibtex Type: | Conference Paper (inproceedings) |
Address: | Piscataway, NJ |
DOI: | 10.1109/3DV57658.2022.00073 |
Event Name: | International Conference on 3D Vision (3DV 2022) |
Event Place: | Prague, Czechia |
State: | Published |
URL: | https://eth-ait.github.io/tempclr |
Electronic Archiving: | grant_archive |
ISBN: | 978-1-6654-5670-8 |
Links: |
BibTex
@inproceedings{ziani2022tempclr, title = {{TempCLR}: Reconstructing Hands via Time-Coherent Contrastive Learning}, booktitle = {2022 International Conference on 3D Vision (3DV 2022)}, abstract = {We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively.}, pages = {627--636}, publisher = {IEEE}, address = {Piscataway, NJ}, month = sep, year = {2022}, slug = {ziani2022tempclr}, author = {Ziani, Andrea and Fan, Zicong and Kocabas, Muhammed and Christen, Sammy and Hilliges, Otmar}, url = {https://eth-ait.github.io/tempclr}, month_numeric = {9} }