Perzeptive Systeme Ph.D. Thesis 2023

Learning Clothed 3D Human Models with Articulated Neural Implicit Representations

Thumb xxl xuchenthesis

3D digital humans are important for a range of applications including movie and game production, virtual and augmented reality, and human-computer interaction. However, existing industrial solutions for creating 3D digital humans rely on expensive scanning devices and intensive manual labor, preventing their broader application. To address these challenges, the research community focuses on learning 3D parametric human models from data, aiming to automatically generate realistic digital humans based on input parameters that specify pose and shape attributes. Although recent advancements have enabled the generation of faithful 3D human bodies, modeling realistic humans that include additional features such as clothing, hair, and accessories remains an open research challenge. The goal of this thesis is to develop 3D parametric human models that can generate realistic digital humans including not only human bodies but also additional features, in particular clothing. The central challenge lies in the fundamental problem of how to represent non-rigid, articulated, and topology-varying shapes. Explicit geometric representations like polygon meshes lack the flexibility needed to model varying topology between clothing and human bodies, and across different clothing styles. On the other hand, implicit representations, such as signed distance functions, are topologically flexible but do not have a robust articulation algorithm yet. To tackle this problem, we first introduce a principled algorithm that models articulation for implicit representations, in particular the recently emerging neural implicit representations which have shown impressive modeling fidelity. Our algorithm, SNARF, generalizes linear blend skinning for polygon meshes to implicit representations and can faithfully articulate implicit shapes to any pose. SNARF is fully differentiable, which enables learning skinning weights and shapes jointly from posed observations. By leveraging this algorithm, we can learn single-subject clothed human models with realistic shapes and natural deformations from 3D scans. We further improve SNARF’s efficiency with several implementation and algorithmic optimizations, including using a more compact representation of the skinning weights, factoring out redundant computations, and custom CUDA kernel implementations. Collectively, these adaptations result in a speedup of 150 times while preserving accuracy, thereby enabling the efficient learning of 3D animatable humans. Next, we go beyond single-subject modeling and tackle the more challenging task of generative modeling clothed 3D humans. By integrating our articulation module with deep generative models, we have developed a generative model capable of creating novel 3D humans with various clothing styles and identities, as well as geometric details such as wrinkles. Lastly, to eliminate the reliance on expensive 3D scans and to facilitate texture learning, we introduce a system that integrates our differentiable articulation module with differentiable volume rendering in an end-to-end manner, enabling the reconstruction of animatable 3D humans directly from 2D monocular videos. The contributions of this thesis significantly advance the realistic generation and reconstruction of clothed 3D humans and provide new tools for modeling non-rigid, articulated, and topology-varying shapes. We hope that this work will contribute to the development of 3D human modeling and pave the way for new applications in the future.

Author(s): Chen, Xu
Year: 2023
Month: July
Bibtex Type: Ph.D. Thesis (phdthesis)
Degree Type: PhD
DOI: https://doi.org/10.3929/ethz-b-000634303
State: Published
Links:

BibTex

@phdthesis{XuChenThesis,
  title = {Learning Clothed {3D} Human Models with Articulated Neural Implicit Representations},
  abstract = {3D digital humans are important for a range of applications including movie and game production, virtual and augmented reality, and human-computer interaction. However, existing industrial solutions for creating 3D digital humans rely on expensive scanning devices and intensive manual labor, preventing their broader application. To address these challenges, the research community focuses on learning 3D parametric human models from data, aiming to automatically generate realistic digital humans based on input parameters that specify pose and shape attributes. Although recent advancements have enabled the generation of faithful 3D human bodies, modeling realistic humans that include additional features such as clothing, hair, and accessories remains an open research challenge. 
  
  The goal of this thesis is to develop 3D parametric human models that can generate realistic digital humans including not only human bodies but also additional features, in particular clothing. The central challenge lies in the fundamental problem of how to represent non-rigid, articulated, and topology-varying shapes. Explicit geometric representations like polygon meshes lack the flexibility needed to model varying topology between clothing and human bodies, and across different clothing styles. On the other hand, implicit representations, such as signed distance functions, are topologically flexible but do not have a robust articulation algorithm yet.
  
  To tackle this problem, we first introduce a principled algorithm that models articulation for implicit representations, in particular the recently emerging neural implicit representations which have shown impressive modeling fidelity. Our algorithm, SNARF, generalizes linear blend skinning for polygon meshes to implicit representations and can faithfully articulate implicit shapes to any pose. SNARF is fully differentiable, which enables learning skinning weights and shapes jointly from posed observations. By leveraging this algorithm, we can learn single-subject clothed human models with realistic shapes and natural deformations from 3D scans.
  
  We further improve SNARF’s efficiency with several implementation and algorithmic optimizations, including using a more compact representation of the skinning weights, factoring out redundant computations, and custom CUDA kernel implementations. Collectively, these adaptations result in a speedup of 150 times while preserving accuracy, thereby enabling the efficient learning of 3D animatable humans. 
  
  Next, we go beyond single-subject modeling and tackle the more challenging task of generative modeling clothed 3D humans. By integrating our articulation module with deep generative models, we have developed a generative model capable of creating novel 3D humans with various clothing styles and identities, as well as geometric details such as wrinkles.
  
  Lastly, to eliminate the reliance on expensive 3D scans and to facilitate texture learning, we introduce a system that integrates our differentiable articulation module with differentiable volume rendering in an end-to-end manner, enabling the reconstruction of animatable 3D humans directly from 2D monocular videos.
  
  The contributions of this thesis significantly advance the realistic generation and reconstruction of clothed 3D humans and provide new tools for modeling non-rigid, articulated, and topology-varying shapes. We hope that this work will contribute to the development of 3D human modeling and pave the way for new applications in the future.},
  degree_type = {PhD},
  month = jul,
  year = {2023},
  slug = {xuchenthesis},
  author = {Chen, Xu},
  month_numeric = {7}
}