Perceiving Systems Conference Paper 2022

TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning

Thumbnail

We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively.

Author(s): Ziani, Andrea and Fan, Zicong and Kocabas, Muhammed and Christen, Sammy and Hilliges, Otmar
Book Title: 2022 International Conference on 3D Vision (3DV 2022)
Pages: 627--636
Year: 2022
Month: September
Publisher: IEEE
Bibtex Type: Conference Paper (inproceedings)
Address: Piscataway, NJ
DOI: 10.1109/3DV57658.2022.00073
Event Name: International Conference on 3D Vision (3DV 2022)
Event Place: Prague, Czechia
State: Published
URL: https://eth-ait.github.io/tempclr
Electronic Archiving: grant_archive
ISBN: 978-1-6654-5670-8
Links:

BibTex

@inproceedings{ziani2022tempclr,
  title = {{TempCLR}: Reconstructing Hands via Time-Coherent Contrastive Learning},
  booktitle = {2022 International Conference on 3D Vision (3DV 2022)},
  abstract = {We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively.},
  pages = {627--636},
  publisher = {IEEE},
  address = {Piscataway, NJ},
  month = sep,
  year = {2022},
  slug = {ziani2022tempclr},
  author = {Ziani, Andrea and Fan, Zicong and Kocabas, Muhammed and Christen, Sammy and Hilliges, Otmar},
  url = {https://eth-ait.github.io/tempclr},
  month_numeric = {9}
}