Events & Talks

Perceiving Systems Talk 28-11-2024 How to predict the inside from the outside? Segment, register, model and infer! Observing and modeling the human body has attracted scientific efforts since the very early times in history. In the recent decades, though, several imaging modalities, such as Computed Tomography scanners (CT), Magnetic Resonance Imaging (MRI), or X-ray have provided the means to “see” inside the body. Most interestingly, there is growing evidence pointing that the shape of the surface of the human body is highly correlated with its internal properties, for example, the body composition, the size of the bones, and the amount of muscle and adipose tissue (fat). In this talk I will go over ... Marilyn Keller

Perceiving Systems Talk 14-10-2024 Diffusion Models for Human Motion Synthesis Character motion synthesis stands as a central challenge in computer animation and graphics. The successful adaptation of diffusion models to the field boosted synthesis quality and provided intuitive controls such as text and music. One of the earliest and most popular methods to do so is Motion Diffusion Model (MDM) [ICLR 2023]. In this talk, I will review how MDM incorporates domain know-how into the diffusion model and enables intuitive editing capabilities. Then, I will present two recent works, each suggesting a refreshing take on motion diffusion and extending its abilities to new... Omid Taheri

Perceiving Systems Talk 10-10-2024 Reconstruction and Animation of Realistic Head Avatars Digital humans, or realistic avatars, are a centerpiece of future telepresence and special effects systems, and human head modeling is one of their main components. The abovementioned applications, however, are highly demanding in terms of avatar creation speed, as well as realism, and controllability. This talk will focus on the approaches that create controllable and detailed 3D head avatars using the data from consumer-grade devices, such as smartphones, in an uncalibrated and unconstrained capture setting. We will discuss leveraging in-the-wild internet videos and synthetic data sources... Vanessa Sklyarova

Perceiving Systems Talk 26-09-2024 Collaborative Control for Geometry-Conditioned PBR Image Generation Current diffusion models only generate RGB images. If we want to make progress towards graphics-ready 3D content generation, we need a PBR foundation model, but there is not enough PBR data available to train such a model from scratch. We introduce Collaborative Control, which tightly links a new PBR diffusion model to a pre-trained RGB model. We show that this dual architecture does not risk catastrophic forgetting, outputting high-quality PBR images and generalizing well beyond the PBR training dataset. Furthermore, the frozen base model remains compatible with techniques such as IP-Adapter. Soubhik Sanyal

Perceiving Systems Talk 26-09-2024 Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation In this talk, I will present Geometry Image Diffusion (GIMDiffusion), a novel method designed to generate 3D objects from text prompts efficiently. GIMDiffusion uses geometry images, a 2D representation of 3D shapes, which allows the use of existing image-based architectures instead of complex 3D-aware models. This approach reduces computational costs and simplifies the model design. By incorporating Collaborative Control, the method exploits rich priors of pretrained Text-to-Image models like Stable Diffusion, enabling strong generalization even with limited 3D training data. GIMDiffusion ... Soubhik Sanyal

Perceiving Systems Talk 12-09-2024 Generalizable Object-aware Human Motion Synthesis Data-driven virtual 3D character animation has recently witnessed remarkable progress. The realism of virtual characters is a core contributing factor to the quality of computer animations and user experience in immersive applications like games, movies, and VR/AR. However, existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalize well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. In this talk, I will present ROAM, an alternat... Nikos Athanasiou

Perceiving Systems Talk 22-08-2024 Real Virtual Humans With the explosive growth of available training data, 3D human pose and shape estimation is ahead of a transition to a data-centric paradigm. To leverage data scale, we need flexible models trainable from heterogeneous data sources. To this end, our latest work, Neural Localizer Fields, seamlessly unifies different human pose and shape-related tasks and datasets though the ability - both at training and test time - to query any arbitrary point of the human volume, and obtain its estimated location in 3D, based on a single RGB image. We achieve this by learning a continuous neural field of b... Marilyn Keller

Perceiving Systems Talk 25-07-2024 4D Dynamic Scene Reconstruction, Editing, and Generation. People live in a 4D dynamic moving world. While videos serve as the most convenient medium to capture this dynamic world, they lack the capability to present the 4D nature of our world. Therefore, 4D video reconstruction, free-viewpoint rendering, and high-quality editing and generation offer innovative opportunities for content creation, virtual reality, telepresence, and robotics. Although promising, they also pose significant challenges in terms of efficiency, 4D motion and dynamics, temporal and subject consistency, and text-3D/video alignment. In light of these challenges, this talk wi... Omid Taheri

Perceiving Systems Talk 23-07-2024 Multimodal Social Signal Processing for Human-Robot Interaction Science fiction has long promised us interfaces and robots that interact with us as smoothly as humans do - Rosie the Robot from The Jetsons, C-3PO from Star Wars, and Samantha from Her. Today, interactive robots and voice user interfaces are moving us closer to effortless, human-like interactions in the real world. In this talk, I will discuss the opportunities and challenges in finely analyzing, detecting and generating non-verbal communication in context, including gestures, gaze, auditory signals, and facial expressions. Specifically, I will discuss how we might allow robots and virtual... Yao Feng Michael Black

Perceiving Systems Talk 18-07-2024 Integrating AI Agents into Human Lives via a Simulation Approach As the rapid growth of AI techniques, we might witness the emergence of AI agents entering our lives, reminiscent of new species. Ensuring these AI agents can well integrate into human life would be a profounding challenge. We urge these agents to be highly performant, safe, and well-aligned with human values. However, directly training and testing AI agents in real-world environments to guarantee their performance and safety is costly and can disrupt everyday life. Thus, we are exploring a simulation-based approach to incubate these AI agents. In this talk, we will highlight the role of si... Yao Feng

Perceiving Systems Talk 18-07-2024 Recreating Real Garments in Virtual Space with Gaussian Splatting and GNNs Recent advances in scene reconstruction with 3D Gaussian Splatting and cloth simulation with Graph neural networks open the prospects for methods that reconstruct proto-realistic virtual garments from visual observations. In this talk we will present our recently submitted paper – Gaussian Garments. There we reconstruct simulation ready photorealistic garments from multi-view videos. With the power of 3D Gaussian Splatting we are able to match three key aspects of real garments in virtual space: their geometry, appearance and behavior. The resulting virtual garments can then be combined int... Artur Grigorev

Perceiving Systems Talk 08-07-2024 Creating High-End Visuals with Real-Time Technology Creating captivating 3D visuals, particularly photorealistic CGI, demands a diverse range of tools, techniques, and expertise, from concept design to the creation of entire 3D worlds. Linear content generation represents the highest standard of visual quality and has long been a source of inspiration for game developers. In this talk, we will explore the advancements in techniques that have contributed to the rise of real-time technologies in movies and game cinematics. We will delve into projects created with Unreal Engine, such as The Matrix Awakens, Vaulted Halls Entombed (Netflix S... Yao Feng

Perceiving Systems Talk 04-07-2024 Text-Driven 3D Modeling of Avatars Generating 3D objects poses notable challenges due to the limited availability of annotated 3D datasets, unlike their 2D counterparts. Current approaches often resort to models trained on 2D data, resulting in prolonged optimization phases. Conversely, models trained on 3D datasets enable inference without optimization but suffer from limited dataset diversity. This talk explores methodologies for generative 3D modelling of human heads and garments, pivotal for human avatar creation. First, we introduce "Clip-Head," a text-to-textured 3D head generation model that generates a textured NPHM ... Victoria Fernandez Abrevaya

Perceiving Systems Talk 10-06-2024 Towards Human-Centric Foundation Models: Pretraining Datasets and Unified Architectures Recent years have witnessed great research interests in Human-Centric Visual Computing, such as person re-identification in social surveillance, mesh recovery in Metaverse, and pedestrian detection in autonomous driving. The recent development of large model offers the opportunity to unify these human-centric tasks and achieve improved performance by merging public datasets from different tasks. This talk will present our recent work on developing human-centric unified models on 2D vision, 3D vision, Skelton-based and vision-language tasks. We hope our model will be integrated to the curre... Yandong Wen

Perceiving Systems Talk 02-05-2024 Generative Rendering and Beyond Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models (SORA). Despite great promise, video diffusion models are difficult to control, hindering users from applying their own creativity rather than amplifying it. In this talk, we present a novel approach called Generative Rendering that combines the controllability of dynamic 3D me... Shrisha Bharadwaj Michael Black

Perceiving Systems Talk 04-04-2024 Modeling and Reconstructing Garments with Sewing Patterns The problems of creating new garments (modeling) or reproducing the existing ones (reconstruction) appear in various fields: from fashion production to digital human modeling for the metaverse. The talk introduces approaches to a novel garment creation paradigm: programming-based parametric sewing pattern construction and its application to generating rich synthetic datasets of garments with sewing patterns. We will then discuss how the availability of ground truth sewing patterns allows posing the learning-based garment reconstruction problem as a sewing pattern recovery. Such reformulatio... Yao Feng Michael Black

Perceiving Systems Talk 13-03-2024 Geometric Regularizations for 3D Shape Generation Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference be... Yuliang Xiu

Perceiving Systems Talk 18-01-2024 Mining Visual Knowledge from Large Pre-trained Models Computer vision made huge progress in the past decade with the dominant supervised learning paradigm, that is training large-scale neural networks on each task with ever larger datasets. However, in many cases, scalable data or annotation collection is intractable. In contrast, humans can easily adapt to new vision tasks with very little data or labels. In order to bridge this gap, we found that there actually exists rich visual knowledge in large pre-trained models, i.e., models trained on scalable internet images with either self-supervised or generative objectives. And we proposed differ... Yuliang Xiu Yandong Wen

Perceiving Systems Talk 30-11-2023 RAVEN: Rethinking Adversarial Video generation with Efficient tri-plane Networks We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity b... Yandong Wen

Perceiving Systems Talk 19-10-2023 Orthogonal Butterfly: Parameter-Efficient Orthogonal Adaptation of Foundation Models via Butterfly Factorization Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a ... Yandong Wen

Perceiving Systems Talk 12-10-2023 Ghost on the Shell: An Expressive Representation of General 3D Shapes The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they enable 1) fast physics-based rendering with realistic material and lighting, 2) physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open,... Yandong Wen

Perceiving Systems Talk 17-08-2023 Face Exploration - Capture all Degrees of Freedom of the Face A high quality data capture is decisive for your scientific work. As a member of the data team, it is a core task of my daily routine to ensure good quality standards in this field. My talk will enlighten the background of this work, starting from scanner set-up and the corresponding data outcome with focus on the Face Scanner. A work, each scientist can profit from for his personal projects. I will take the occasion to present our most recent face capture study named FACE EXPLORATION, of which Timo Bolkart is the leading scientist. A selection of representative sequences including facial m... Yandong Wen

Perceiving Systems Talk 13-07-2023 Full-body avatars from single images and textual guidance The reconstruction of full body appearance of clothed humans from single-view RGB images is a crucial yet challenging task, primarily due to depth ambiguities and the absence of observations from unseen regions. While existing methods have shown impressive results, they still suffer from limitations such as over-smooth surfaces and blurry textures, particularly lacking details at the backside of the avatar. In this talk, I will delve into how we have addressed these limitations by leveraging text guidance and pretrained text-image models, introducing two novel methods. Firstly, I will prese... Hongwei Yi

Perceiving Systems Talk 13-04-2023 Pose, Kinematics, and Dynamics Recovering accurate 3D human pose and shape from monocular input remains a challenging problem despite the rapid advancements powered by deep neural networks. Existing methods have limitations in achieving both robustness and mesh-image alignment, and the estimated pose suffers from physical artifacts such as foot sliding and body leaning. In this talk, we present two new methods to address these limitations. Firstly, we introduce NIKI, an inverse kinematics algorithm that utilizes an invertible neural network to model both the forward kinematics process and the inverse kinematics process. ... Michael Black

Perceiving Systems Talk 29-03-2023 Language is the key to robust vision systems The ability to extend a model beyond the domain of the training data is central to building robust computer vision models. Methods for dealing with unseen test distributions or biased training data often require leveraging additional image data, but linguistic knowledge of the task and potential domain shifts is much cheaper and easier to obtain. In this talk, I will present three recent works that focus on different ways one can improve accuracy with language advice and incomplete training data via large-scale vision and language models. Lea Müller

Perceiving Systems Talk 23-02-2023 Neural Graphics in a Generative World Recent years have seen significant advancements in deep learning, which has led to a growing belief that Moore's law, which traditionally pertained to the packing of transistors, is now transitioning towards the improvement of photo-realistic 3D graphics. The advancements in this research field can be broadly categorized into two areas: neural fields, which are capable of modeling photo-realistic 3D representations, and diffusion models, which are able to generalize to large scale data and produce photo-realistic images. To combine these technologies for large scale 3D generative modeling, ... Sai Kumar Dwivedi

Perceiving Systems Talk 16-02-2023 What do language models tell us about human-object interaction? Research in artificial intelligence (AI) continues to advance quickly and outperforms humans in many tasks, making its way into our daily lives. However, beneath their superior performance, current technologies, limited in how to perceive, process, and understand our visual world, struggle with understanding and interacting with people. These issues raise the core question of my research: How do we build intelligent systems that can interact with people and offer assistance in a natural and seamless way? In this talk, I will present our recent works on using the CLIP model for object intera... Muhammed Kocabas

Perceiving Systems Talk 19-01-2023 Human Motion Generation with Diffusion Models Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages, speech, and music. However, it remains challenging to achieve diverse and fine-grained motion generation with comprehensive condition signals. Inspired by the success in image generation, recent works attempt to apply diffusion models to motion generation tasks (Motion Diffusion Models) and achieve impressive progress in as... Shashank Tripathi

Perceiving Systems Talk 12-01-2023 Data Infrastructure for Scaling up Human Understanding and Modelling to the Real World Human sensing and modelling are fundamental tasks in vision and graphics with numerous applications. However, due to the prohibitive cost, existing datasets are often limited in scale and diversity. This talk shares two of our recent works to tackle data scarcity. First, with the advances of new sensors and algorithms, paired data can be obtained from an inexpensive set-up and an automatic annotation pipeline. Specifically, we demonstrate the data collection solution by introducing HuMMan, a large-scale multimodal 4D human dataset. HuMMan has several appealing properties: 1) multimodal data... Shashank Tripathi

Perceiving Systems Talk 22-09-2022 Combine and conquer: representation learning from multiple data distributions It is becoming less and less controversial to say that the days of learning representations through label supervision are over. Recent work discovers that such regimes are not only expensive, but also suffer from various generalisation/robustness issues. This is somewhat unsurprising, as perceptual data (vision, language) are rich and cannot be well represented by a single label --- doing so inevitably result in the model learning spurious features that trivially correlates to the label. In this talk, I will introduce my work during my PhD at Oxford, which looks at representation learning... Yao Feng

Perceiving Systems Talk 08-09-2022 Computer Vision for Automated Video Editing and Understanding. Video content creation has boomed in recent years. Every day hundreds of thousands of video hours are uploaded to the internet. Thus, video content editing has become more popular and accessible to amateur users. However, current Computer Vision (CV) techniques have not studied technologies to help video editing become a less tedious task. Currently, editors spend hours cutting and stitching videos to deliver final edited videos that convey stories. This cutting process is creative but is often repetitive. With the recent advances in CV, one would expect that a system could learn some cutti... Hongwei Yi

Perceiving Systems Talk 04-08-2022 REALY: Rethinking the Evaluation of 3D Face Reconstruction The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluation approach with a new benchmark REALY, consisting of 100 globally aligned face scans with accurate facial keypoints, high-quality region masks, and topology-consistent meshes. Our approach perform... Yandong Wen

Perceiving Systems Talk 28-07-2022 Implicit Neural Representation for Physics-driven Actuated Soft Bodies Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discre... Yao Feng

Perceiving Systems Talk 28-07-2022 Understanding Human Hands in Visual Data Hands are the central means by which humans interact with their surroundings. Understanding human hands help human behavior analysis and facilitate other visual analysis tasks such as action and gesture recognition. Recently, there has been a surge of interest in understanding first-person visual data, and hands are the dominant interaction entities in such activities. Also, there is an explosion of interest in developing computer vision methods for augmented and virtual reality. To deliver an authentic augmented and virtual reality experience, we need to enable humans to interact with the ... Sai Kumar Dwivedi Dimitris Tzionas

Perceiving Systems Talk 27-07-2022 Complete Codec Telepresence Imagine two people, each of them within their own home, being able to communicate and interact virtually with each other as if they are both present in the same shared physical space. Enabling such an experience, i.e., building a telepresence system that is indistinguishable from reality, is one of the goals of Reality Labs Research (RLR) in Pittsburgh. To this end, we develop key technology that combines fundamental computer vision, machine learning, and graphics techniques based on a novel neural reconstruction and rendering paradigm. In this talk, I will cover our advances towards a neur... Yao Feng

Perceiving Systems Talk 13-06-2022 Shape editing, generation, and stylization Manual authoring of 3D content is a laborious and tedious task. In this talk, I present some of 3DL's recent and on-going efforts toward building tools which provide intuitive control for editing, manipulating, and generating 3D shapes. I will discuss how recent advancements, such as joint vision-language embedding spaces can be used to stylize 3D objects, driven by natural language. Finally, I will conclude with ongoing and future work in this direction, as well as other related areas. Omid Taheri

Perceiving Systems Talk 09-06-2022 Learning to create Digital Humans: Generalizable Radiance Fields for Human Performance Rendering In this work, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human, In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose a no... Yuliang Xiu

Perceiving Systems Talk 02-05-2022 Learning to estimate 3D human poses without labeled data Estimating 3D human poses from images or videos is a fundamental task in computer vision. However, the limitation of training data with high-quality 3D pose annotations largely hinder its development and deployment in real applications. In this talk, I will introduce our recent works on training 3D pose estimation models without requiring 3D labeled data. Our first step is to present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator. Specifically, P... Michael Black

Perceiving Systems Talk 25-04-2022 Leverage Kinematic and Contact constraints for understanding hand-object interaction My works focus on inferring and understanding the human hand’s interaction with objects from visual inputs, which include several tasks like pose estimation, grasping pose generation, and interacting pose transfer. Unlike the single-body pose estimation task, understanding the Hand-object (multi-bodies) interactions in 3D spaces is more challenging, due to its high degree of articulations, the projection ambiguity, self or mutual occlusions, and the complicated physical constraints. Designing algorithms to tackle these challenges is my goal. We find that the mutual contact can provide rich ... Yuliang Xiu

Perceiving Systems Talk 19-04-2022 Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision 3D face reconstruction under occlusions is highly challenging because of the large variability of the appearance and location of occluders. Currently, the most successful methods fit a 3D face model through inverse rendering and assume a given segmentation of the occluder to avoid fitting the occluder. However, the segmentation annotations are costly since training an occlusion segmentation model requires large amounts of annotated data. To overcome this, we introduce a model-based approach for 3D face reconstruction that is highly robust to occlusions but does not require any occlusion ann... Victoria Fernandez Abrevaya

Perceiving Systems Talk 12-04-2022 Mixing Synthetic and Real-World Captures for RGB Hand Pose Estimation How can we learn models for hand pose estimation without any (real-world) labels? This talk presents our recent efforts in tackling the challenging scenario of learning from labelled synthetic data and unlabelled real-world data. I will focus on two strategies that we find to be effective: (1) cross-modal consistency and alignment for representation learning and (2) pseudo-label corrections and refinement. The second part of the talk will introduce Assembly101, our newly recorded dataset that tackles 3D hand pose and action understanding over time. Assembly101 is a new procedural activit... Dimitris Tzionas

Perceiving Systems Talk 07-04-2022 Modeling Humans at Rest with Applications to Robotic Assistance Humans spend a large part of their lives resting. Machine perception of this class of body poses would be beneficial to numerous applications, but it is complicated by line-of-sight occlusion from bedding. Pressure sensing mats are a promising alternative, but data is challenging to collect at scale. To overcome this, we use modern physics engines to simulate bodies resting on a soft bed with a pressure sensing mat. This method can efficiently generate data at scale for training deep neural networks. We present a deep model trained on this data that infers 3D human pose and body shape from ... Dimitris Tzionas Chun-Hao Paul Huang

Perceiving Systems Talk 07-04-2022 Reconstructing Static Scenes and Dynamic Humans with Implicit Neural Representations 3D reconstruction is a long-standing problem in computer vision and has a variety of applications such as virtual reality, 3D content generation, and telepresence. In this talk, I will present our progress on 3D reconstruction of static scenes and dynamic humans with implicit neural representations. The first part of the talk introduces an effective regularization when optimizing implicit neural representations on indoor scenes based on the Manhattan-world Assumption. In the second part, I will show some animatable implicit neural representations for modeling dynamic humans from videos. Hongwei Yi

Perceiving Systems Talk 08-02-2022 Structure-aware Narrative Understanding and Summarization In this work, we analyze and summarize full-length movies from multimodal input (i.e., video, text, audio). We first hypothesize that identifying the narrative structure of movies is a precondition for summarizing them. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie that define the narrative structure and determine its progression and thematic units. Therefore, we introduce the task of Turning Point (TP) identification and leverage it for movie summarization and trailer generation. Next, we propos... Nikos Athanasiou Chun-Hao Paul Huang

Perceiving Systems Talk 18-01-2022 Unified Simulation, Perception, and Generation of Human Behavior Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this talk, I will present a holistic approach to human behavior modeling and tackle its three essential aspects --- simulation, perception, and generation. I will show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. Since humans live in a physical world, we treat simulation as the foundation of our approach and start by developing a fundamental framework for representing human ... Hongwei Yi

Perceiving Systems Talk 14-12-2021 Discrete inverse spectral geometry for shape analysis Spectral quantities as the eigenvalues of the Laplacian operator are widely used in geometry processing since they provide a very informative summary of the intrinsic geometry of deformable shapes. Typically, the intrinsic properties of shapes are computed from their representation in 3D space and are used to encode compact geometric features, thus adopting a data-reduction principle. On the contrary, this talk focuses on the inverse problem: namely, recovering an extrinsic embedding from a purely intrinsic encoding, like in the classical “hearing the shape of the drum” problem. I will sta... Silvia Zuffi

Perceiving Systems Talk 10-12-2021 Next Generation Lifelike Avatar Creation High-fidelity avatar creation for films and games is tied with complex capture equipment, massive data, a long production cycle, and intensive manual labor by a production team. And it may still be in the notorious Uncanny Valley. In this talk, we will explore how to produce a lifelike avatar in a low-cost way. We will show how to leverage deep learning networks to accelerate and simplify the industrial avatar production procedure from data capturing to animation. And bring photorealism to the next level! Timo Bolkart

Perceiving Systems Talk 05-10-2021 Toward Reconstructing Face from Voice We address a new challenge posed by voice profiling - reconstructing someone’s face from their voice. Specifically, given an audio clip spoken by an unseen person, we aim to reconstruct a face that has as many associations as possible with the speaker in terms of identity. In this talk, I will introduce how we explore and approach the ultimate goal step by step. First, we investigate the audio-visual association by matching voices to faces based on identity, and vice versa. Second, we set up a baseline for reconstructing 2D face images from a voice recording and show reasonable reconstructi... Timo Bolkart

Perceiving Systems Talk 28-09-2021 DeepMultiCap & Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras We propose DeepMultiCap, a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models. To tackle the serious occlusion challenge for close interacting scenes, we combine a recently proposed pixel-aligned implicit function with a parametric model for robust reconstruction of the invisible surface areas. An effective attention-aware module is designed to obtain the fine-grained geometry details from multi-view images, where high-fidelity results can be generated. I... Chun-Hao Paul Huang

Perceiving Systems Talk 27-09-2021 Refraction and Absorption for Underwater Shape Recovery In this talk the speaker will present her work on the recovery of rigid and deformable 3D shape from underwater images. Silvia Zuffi

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives