Events & Talks
Perceiving Systems
Talk
Robin Courant
10-03-2025
How and what to film in virtual environments?
Content creation for movies and video games has been transformed with the rise of virtual environments, yet filming within these digital worlds remains a complex challenge. This talk explores the question: how and what to film in virtual environments? We examine the role of camera control and human interaction across different virtual settings, including NeRF, 3D engines, and video generation.
Victoria Fernandez Abrevaya
Perceiving Systems
Talk
Ailing Zeng
18-02-2025
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
High-quality video generation—encompassing text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) generation—plays a pivotal role in content creation and world simulation. While several DiT-based models have advanced rapidly in the past year, a thorough exploration of their capabilities, limitations, and alignment with human preferences remains incomplete. In this talk, I will present recent advancements in SORA-like T2V, I2V, and V2V models and products, bridging the gap between academic research and industry applications. Through live demonstrations and comparative analyses, ...
Nikos Athanasiou
Michael Black
Perceiving Systems
Talk
Yannis Siglidis
06-02-2025
Computer Vision at the Mirror Stage: Questioning and Refining Visual Categorization
Computer vision advancements in predicting and visualizing labels, often motivate us to consider the relationship between labels and images as a given. Yet, the prototypical nature of coherent labels, such as the alphabet of handwritten characters, can help us question assumed families of handwritten variation.
Nikos Athanasiou
Perceiving Systems
Talk
Sergi Pujades
28-11-2024
How to predict the inside from the outside? Segment, register, model and infer!
Observing and modeling the human body has attracted scientific efforts since the very early times in history. In the recent decades, though, several imaging modalities, such as Computed Tomography scanners (CT), Magnetic Resonance Imaging (MRI), or X-ray have provided the means to “see” inside the body. Most interestingly, there is growing evidence pointing that the shape of the surface of the human body is highly correlated with its internal properties, for example, the body composition, the size of the bones, and the amount of muscle and adipose tissue (fat). In this talk I will go over ...
Marilyn Keller
Perceiving Systems
Talk
Guy Tevet
14-10-2024
Diffusion Models for Human Motion Synthesis
Character motion synthesis stands as a central challenge in computer animation and graphics. The successful adaptation of diffusion models to the field boosted synthesis quality and provided intuitive controls such as text and music.
One of the earliest and most popular methods to do so is Motion Diffusion Model (MDM) [ICLR 2023]. In this talk, I will review how MDM incorporates domain know-how into the diffusion model and enables intuitive editing capabilities.
Then, I will present two recent works, each suggesting a refreshing take on motion diffusion and extending its abilities to new...
Omid Taheri
Perceiving Systems
Talk
Egor Zakharov
10-10-2024
Reconstruction and Animation of Realistic Head Avatars
Digital humans, or realistic avatars, are a centerpiece of future telepresence and special effects systems, and human head modeling is one of their main components. The abovementioned applications, however, are highly demanding in terms of avatar creation speed, as well as realism, and controllability. This talk will focus on the approaches that create controllable and detailed 3D head avatars using the data from consumer-grade devices, such as smartphones, in an uncalibrated and unconstrained capture setting. We will discuss leveraging in-the-wild internet videos and synthetic data sources...
Vanessa Sklyarova
Perceiving Systems
Talk
Simon Donne
26-09-2024
Collaborative Control for Geometry-Conditioned PBR Image Generation
Current diffusion models only generate RGB images. If we want to make progress towards graphics-ready 3D content generation, we need a PBR foundation model, but there is not enough PBR data available to train such a model from scratch. We introduce Collaborative Control, which tightly links a new PBR diffusion model to a pre-trained RGB model. We show that this dual architecture does not risk catastrophic forgetting, outputting high-quality PBR images and generalizing well beyond the PBR training dataset. Furthermore, the frozen base model remains compatible with techniques such as IP-Adapter.
Soubhik Sanyal
Perceiving Systems
Talk
Slava Elizarov
26-09-2024
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
In this talk, I will present Geometry Image Diffusion (GIMDiffusion), a novel method designed to generate 3D objects from text prompts efficiently. GIMDiffusion uses geometry images, a 2D representation of 3D shapes, which allows the use of existing image-based architectures instead of complex 3D-aware models. This approach reduces computational costs and simplifies the model design. By incorporating Collaborative Control, the method exploits rich priors of pretrained Text-to-Image models like Stable Diffusion, enabling strong generalization even with limited 3D training data. GIMDiffusion ...
Soubhik Sanyal
Perceiving Systems
Talk
Wanyue Zhang
12-09-2024
Generalizable Object-aware Human Motion Synthesis
Data-driven virtual 3D character animation has recently witnessed remarkable progress. The realism of virtual characters is a core contributing factor to the quality of computer animations and user experience in immersive applications like games, movies, and VR/AR. However, existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalize well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. In this talk, I will present ROAM, an alternat...
Nikos Athanasiou
Perceiving Systems
Talk
István Sárándi
22-08-2024
Real Virtual Humans
With the explosive growth of available training data, 3D human pose and shape estimation is ahead of a transition to a data-centric paradigm. To leverage data scale, we need flexible models trainable from heterogeneous data sources. To this end, our latest work, Neural Localizer Fields, seamlessly unifies different human pose and shape-related tasks and datasets though the ability - both at training and test time - to query any arbitrary point of the human volume, and obtain its estimated location in 3D, based on a single RGB image. We achieve this by learning a continuous neural field of b...
Marilyn Keller
Perceiving Systems
Talk
Jiawei Liu
25-07-2024
4D Dynamic Scene Reconstruction, Editing, and Generation.
People live in a 4D dynamic moving world. While videos serve as the most convenient medium to capture this dynamic world, they lack the capability to present the 4D nature of our world. Therefore, 4D video reconstruction, free-viewpoint rendering, and high-quality editing and generation offer innovative opportunities for content creation, virtual reality, telepresence, and robotics. Although promising, they also pose significant challenges in terms of efficiency, 4D motion and dynamics, temporal and subject consistency, and text-3D/video alignment. In light of these challenges, this talk wi...
Omid Taheri
Perceiving Systems
Talk
Angelica Lim
23-07-2024
Multimodal Social Signal Processing for Human-Robot Interaction
Science fiction has long promised us interfaces and robots that interact with us as smoothly as humans do - Rosie the Robot from The Jetsons, C-3PO from Star Wars, and Samantha from Her. Today, interactive robots and voice user interfaces are moving us closer to effortless, human-like interactions in the real world. In this talk, I will discuss the opportunities and challenges in finely analyzing, detecting and generating non-verbal communication in context, including gestures, gaze, auditory signals, and facial expressions. Specifically, I will discuss how we might allow robots and virtual...
Yao Feng
Michael Black
Perceiving Systems
Talk
Siheng Chen
18-07-2024
Integrating AI Agents into Human Lives via a Simulation Approach
As the rapid growth of AI techniques, we might witness the emergence of AI agents entering our lives, reminiscent of new species. Ensuring these AI agents can well integrate into human life would be a profounding challenge. We urge these agents to be highly performant, safe, and well-aligned with human values. However, directly training and testing AI agents in real-world environments to guarantee their performance and safety is costly and can disrupt everyday life. Thus, we are exploring a simulation-based approach to incubate these AI agents. In this talk, we will highlight the role of si...
Yao Feng
Perceiving Systems
Talk
Boxiang Rong
18-07-2024
Recreating Real Garments in Virtual Space with Gaussian Splatting and GNNs
Recent advances in scene reconstruction with 3D Gaussian Splatting and cloth simulation with Graph neural networks open the prospects for methods that reconstruct proto-realistic virtual garments from visual observations. In this talk we will present our recently submitted paper – Gaussian Garments. There we reconstruct simulation ready photorealistic garments from multi-view videos. With the power of 3D Gaussian Splatting we are able to match three key aspects of real garments in virtual space: their geometry, appearance and behavior. The resulting virtual garments can then be combined int...
Artur Grigorev
Perceiving Systems
Talk
Yafes Sahin
08-07-2024
Creating High-End Visuals with Real-Time Technology
Creating captivating 3D visuals, particularly photorealistic CGI, demands a diverse range of tools, techniques, and expertise, from concept design to the creation of entire 3D worlds. Linear content generation represents the highest standard of visual quality and has long been a source of inspiration for game developers. In this talk, we will explore the advancements in techniques that have contributed to the rise of real-time technologies in movies and game cinematics.
We will delve into projects created with Unreal Engine, such as The Matrix Awakens, Vaulted Halls Entombed (Netflix S...
Yao Feng
Perceiving Systems
Talk
Pranav Manu
04-07-2024
Text-Driven 3D Modeling of Avatars
Generating 3D objects poses notable challenges due to the limited availability of annotated 3D datasets, unlike their 2D counterparts. Current approaches often resort to models trained on 2D data, resulting in prolonged optimization phases. Conversely, models trained on 3D datasets enable inference without optimization but suffer from limited dataset diversity. This talk explores methodologies for generative 3D modelling of human heads and garments, pivotal for human avatar creation. First, we introduce "Clip-Head," a text-to-textured 3D head generation model that generates a textured NPHM ...
Victoria Fernandez Abrevaya
Perceiving Systems
Talk
Shixiang Tang
10-06-2024
Towards Human-Centric Foundation Models: Pretraining Datasets and Unified Architectures
Recent years have witnessed great research interests in Human-Centric Visual Computing, such as person re-identification in social surveillance, mesh recovery in Metaverse, and pedestrian detection in autonomous driving. The recent development of large model offers the opportunity to unify these human-centric tasks and achieve improved performance by merging public datasets from different tasks. This talk will present our recent work on developing human-centric unified models on 2D vision, 3D vision, Skelton-based and vision-language tasks. We hope our model will be integrated to the curre...
Yandong Wen
Perceiving Systems
Talk
Shengqu Cai
02-05-2024
Generative Rendering and Beyond
Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models (SORA). Despite great promise, video diffusion models are difficult to control, hindering users from applying their own creativity rather than amplifying it. In this talk, we present a novel approach called Generative Rendering that combines the controllability of dynamic 3D me...
Shrisha Bharadwaj
Michael Black
Perceiving Systems
Talk
Maria Korosteleva
04-04-2024
Modeling and Reconstructing Garments with Sewing Patterns
The problems of creating new garments (modeling) or reproducing the existing ones (reconstruction) appear in various fields: from fashion production to digital human modeling for the metaverse. The talk introduces approaches to a novel garment creation paradigm: programming-based parametric sewing pattern construction and its application to generating rich synthetic datasets of garments with sewing patterns. We will then discuss how the availability of ground truth sewing patterns allows posing the learning-based garment reconstruction problem as a sewing pattern recovery. Such reformulatio...
Yao Feng
Michael Black
Perceiving Systems
Talk
Qixing Huang
13-03-2024
Geometric Regularizations for 3D Shape Generation
Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference be...
Yuliang Xiu
Perceiving Systems
Talk
Luming Tang
18-01-2024
Mining Visual Knowledge from Large Pre-trained Models
Computer vision made huge progress in the past decade with the dominant supervised learning paradigm, that is training large-scale neural networks on each task with ever larger datasets. However, in many cases, scalable data or annotation collection is intractable. In contrast, humans can easily adapt to new vision tasks with very little data or labels. In order to bridge this gap, we found that there actually exists rich visual knowledge in large pre-trained models, i.e., models trained on scalable internet images with either self-supervised or generative objectives. And we proposed differ...
Yuliang Xiu
Yandong Wen
Perceiving Systems
Talk
Partha Ghosh
30-11-2023
RAVEN: Rethinking Adversarial Video generation with Efficient tri-plane Networks
We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity b...
Yandong Wen
Perceiving Systems
Talk
Weiyang Liu
19-10-2023
Orthogonal Butterfly: Parameter-Efficient Orthogonal Adaptation of Foundation Models via Butterfly Factorization
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a ...
Yandong Wen
Perceiving Systems
Talk
Zhen Liu
12-10-2023
Ghost on the Shell: An Expressive Representation of General 3D Shapes
The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they enable 1) fast physics-based rendering with realistic material and lighting, 2) physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open,...
Yandong Wen
Perceiving Systems
Talk
Claudia Gallatz
17-08-2023
Face Exploration - Capture all Degrees of Freedom of the Face
A high quality data capture is decisive for your scientific work. As a member of the data team, it is a core task of my daily routine to ensure good quality standards in this field. My talk will enlighten the background of this work, starting from scanner set-up and the corresponding data outcome with focus on the Face Scanner. A work, each scientist can profit from for his personal projects. I will take the occasion to present our most recent face capture study named FACE EXPLORATION, of which Timo Bolkart is the leading scientist. A selection of representative sequences including facial m...
Yandong Wen
Perceiving Systems
Talk
Yangyi Huang
13-07-2023
Full-body avatars from single images and textual guidance
The reconstruction of full body appearance of clothed humans from single-view RGB images is a crucial yet challenging task, primarily due to depth ambiguities and the absence of observations from unseen regions. While existing methods have shown impressive results, they still suffer from limitations such as over-smooth surfaces and blurry textures, particularly lacking details at the backside of the avatar. In this talk, I will delve into how we have addressed these limitations by leveraging text guidance and pretrained text-image models, introducing two novel methods. Firstly, I will prese...
Hongwei Yi
Perceiving Systems
Talk
Bian Siyuan
13-04-2023
Pose, Kinematics, and Dynamics
Recovering accurate 3D human pose and shape from monocular input remains a challenging problem despite the rapid advancements powered by deep neural networks. Existing methods have limitations in achieving both robustness and mesh-image alignment, and the estimated pose suffers from physical artifacts such as foot sliding and body leaning. In this talk, we present two new methods to address these limitations. Firstly, we introduce NIKI, an inverse kinematics algorithm that utilizes an invertible neural network to model both the forward kinematics process and the inverse kinematics process. ...
Michael Black
Perceiving Systems
Talk
Lisa Dunlap
29-03-2023
Language is the key to robust vision systems
The ability to extend a model beyond the domain of the training data is central to building robust computer vision models. Methods for dealing with unseen test distributions or biased training data often require leveraging additional image data, but linguistic knowledge of the task and potential domain shifts is much cheaper and easier to obtain. In this talk, I will present three recent works that focus on different ways one can improve accuracy with language advice and incomplete training data via large-scale vision and language models.
Lea Müller
Perceiving Systems
Talk
Anurag Ranjan
23-02-2023
Neural Graphics in a Generative World
Recent years have seen significant advancements in deep learning, which has led to a growing belief that Moore's law, which traditionally pertained to the packing of transistors, is now transitioning towards the improvement of photo-realistic 3D graphics. The advancements in this research field can be broadly categorized into two areas: neural fields, which are capable of modeling photo-realistic 3D representations, and diffusion models, which are able to generalize to large scale data and produce photo-realistic images. To combine these technologies for large scale 3D generative modeling, ...
Sai Kumar Dwivedi
Perceiving Systems
Talk
Xi Wang
16-02-2023
What do language models tell us about human-object interaction?
Research in artificial intelligence (AI) continues to advance quickly and outperforms humans in many tasks, making its way into our daily lives. However, beneath their superior performance, current technologies, limited in how to perceive, process, and understand our visual world, struggle with understanding and interacting with people. These issues raise the core question of my research: How do we build intelligent systems that can interact with people and offer assistance in a natural and seamless way? In this talk, I will present our recent works on using the CLIP model for object intera...
Muhammed Kocabas
Perceiving Systems
Talk
Mingyuan Zhang
19-01-2023
Human Motion Generation with Diffusion Models
Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages, speech, and music. However, it remains challenging to achieve diverse and fine-grained motion generation with comprehensive condition signals. Inspired by the success in image generation, recent works attempt to apply diffusion models to motion generation tasks (Motion Diffusion Models) and achieve impressive progress in as...
Shashank Tripathi
Perceiving Systems
Talk
Zhongang Cai
12-01-2023
Data Infrastructure for Scaling up Human Understanding and Modelling to the Real World
Human sensing and modelling are fundamental tasks in vision and graphics with numerous applications. However, due to the prohibitive cost, existing datasets are often limited in scale and diversity. This talk shares two of our recent works to tackle data scarcity. First, with the advances of new sensors and algorithms, paired data can be obtained from an inexpensive set-up and an automatic annotation pipeline. Specifically, we demonstrate the data collection solution by introducing HuMMan, a large-scale multimodal 4D human dataset. HuMMan has several appealing properties: 1) multimodal data...
Shashank Tripathi
Perceiving Systems
Talk
Yuge Shi
22-09-2022
Combine and conquer: representation learning from multiple data distributions
It is becoming less and less controversial to say that the days of learning representations through label supervision are over. Recent work discovers that such regimes are not only expensive, but also suffer from various generalisation/robustness issues. This is somewhat unsurprising, as perceptual data (vision, language) are rich and cannot be well represented by a single label --- doing so inevitably result in the model learning spurious features that trivially correlates to the label.
In this talk, I will introduce my work during my PhD at Oxford, which looks at representation learning...
Yao Feng
Perceiving Systems
Talk
Alejandro Pardo
08-09-2022
Computer Vision for Automated Video Editing and Understanding.
Video content creation has boomed in recent years. Every day hundreds of thousands of video hours are uploaded to the internet. Thus, video content editing has become more popular and accessible to amateur users. However, current Computer Vision (CV) techniques have not studied technologies to help video editing become a less tedious task. Currently, editors spend hours cutting and stitching videos to deliver final edited videos that convey stories. This cutting process is creative but is often repetitive. With the recent advances in CV, one would expect that a system could learn some cutti...
Hongwei Yi
Perceiving Systems
Talk
Zenghao Chai
04-08-2022
REALY: Rethinking the Evaluation of 3D Face Reconstruction
The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluation approach with a new benchmark REALY, consisting of 100 globally aligned face scans with accurate facial keypoints, high-quality region masks, and topology-consistent meshes. Our approach perform...
Yandong Wen
Perceiving Systems
Talk
Lingchen Yang
28-07-2022
Implicit Neural Representation for Physics-driven Actuated Soft Bodies
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks.
Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discre...
Yao Feng
Perceiving Systems
Talk
Supreeth Narasimhaswamy
28-07-2022
Understanding Human Hands in Visual Data
Hands are the central means by which humans interact with their surroundings. Understanding human hands help human behavior analysis and facilitate other visual analysis tasks such as action and gesture recognition. Recently, there has been a surge of interest in understanding first-person visual data, and hands are the dominant interaction entities in such activities. Also, there is an explosion of interest in developing computer vision methods for augmented and virtual reality. To deliver an authentic augmented and virtual reality experience, we need to enable humans to interact with the ...
Sai Kumar Dwivedi
Dimitris Tzionas
Perceiving Systems
Talk
Michael Zollhoefer
27-07-2022
Complete Codec Telepresence
Imagine two people, each of them within their own home, being able to communicate and interact virtually with each other as if they are both present in the same shared physical space. Enabling such an experience, i.e., building a telepresence system that is indistinguishable from reality, is one of the goals of Reality Labs Research (RLR) in Pittsburgh. To this end, we develop key technology that combines fundamental computer vision, machine learning, and graphics techniques based on a novel neural reconstruction and rendering paradigm. In this talk, I will cover our advances towards a neur...
Yao Feng
Perceiving Systems
Talk
Rana Hanocka
13-06-2022
Shape editing, generation, and stylization
Manual authoring of 3D content is a laborious and tedious task. In this talk, I present some of 3DL's recent and on-going efforts toward building tools which provide intuitive control for editing, manipulating, and generating 3D shapes. I will discuss how recent advancements, such as joint vision-language embedding spaces can be used to stylize 3D objects, driven by natural language. Finally, I will conclude with ongoing and future work in this direction, as well as other related areas.
Omid Taheri
Perceiving Systems
Talk
Youngjoong Kwon
09-06-2022
Learning to create Digital Humans: Generalizable Radiance Fields for Human Performance Rendering
In this work, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human, In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose a no...
Yuliang Xiu
Perceiving Systems
Talk
Jiashi Feng
02-05-2022
Learning to estimate 3D human poses without labeled data
Estimating 3D human poses from images or videos is a fundamental task in computer vision. However, the limitation of training data with high-quality 3D pose annotations largely hinder its development and deployment in real applications. In this talk, I will introduce our recent works on training 3D pose estimation models without requiring 3D labeled data. Our first step is to present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator. Specifically, P...
Michael Black
Perceiving Systems
Talk
Lixin Yang
25-04-2022
Leverage Kinematic and Contact constraints for understanding hand-object interaction
My works focus on inferring and understanding the human hand’s interaction with objects from visual inputs, which include several tasks like pose estimation, grasping pose generation, and interacting pose transfer. Unlike the single-body pose estimation task, understanding the Hand-object (multi-bodies) interactions in 3D spaces is more challenging, due to its high degree of articulations, the projection ambiguity, self or mutual occlusions, and the complicated physical constraints. Designing algorithms to tackle these challenges is my goal. We find that the mutual contact can provide rich ...
Yuliang Xiu
Perceiving Systems
Talk
Chunlu Li
19-04-2022
Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
3D face reconstruction under occlusions is highly challenging because of the large variability of the appearance and location of occluders. Currently, the most successful methods fit a 3D face model through inverse rendering and assume a given segmentation of the occluder to avoid fitting the occluder. However, the segmentation annotations are costly since training an occlusion segmentation model requires large amounts of annotated data. To overcome this, we introduce a model-based approach for 3D face reconstruction that is highly robust to occlusions but does not require any occlusion ann...
Victoria Fernandez Abrevaya
Perceiving Systems
Talk
Angela Yao
12-04-2022
Mixing Synthetic and Real-World Captures for RGB Hand Pose Estimation
How can we learn models for hand pose estimation without any (real-world) labels? This talk presents our recent efforts in tackling the challenging scenario of learning from labelled synthetic data and unlabelled real-world data. I will focus on two strategies that we find to be effective: (1) cross-modal consistency and alignment for representation learning and (2) pseudo-label corrections and refinement.
The second part of the talk will introduce Assembly101, our newly recorded dataset that tackles 3D hand pose and action understanding over time. Assembly101 is a new procedural activit...
Dimitris Tzionas
Perceiving Systems
Talk
Henry Clever
07-04-2022
Modeling Humans at Rest with Applications to Robotic Assistance
Humans spend a large part of their lives resting. Machine perception of this class of body poses would be beneficial to numerous applications, but it is complicated by line-of-sight occlusion from bedding. Pressure sensing mats are a promising alternative, but data is challenging to collect at scale. To overcome this, we use modern physics engines to simulate bodies resting on a soft bed with a pressure sensing mat. This method can efficiently generate data at scale for training deep neural networks. We present a deep model trained on this data that infers 3D human pose and body shape from ...
Dimitris Tzionas
Chun-Hao Paul Huang
Perceiving Systems
Talk
Sida Peng
07-04-2022
Reconstructing Static Scenes and Dynamic Humans with Implicit Neural Representations
3D reconstruction is a long-standing problem in computer vision and has a variety of applications such as virtual reality, 3D content generation, and telepresence. In this talk, I will present our progress on 3D reconstruction of static scenes and dynamic humans with implicit neural representations. The first part of the talk introduces an effective regularization when optimizing implicit neural representations on indoor scenes based on the Manhattan-world Assumption. In the second part, I will show some animatable implicit neural representations for modeling dynamic humans from videos.
Hongwei Yi
Perceiving Systems
Talk
Pinelopi Papalampidi
08-02-2022
Structure-aware Narrative Understanding and Summarization
In this work, we analyze and summarize full-length movies from multimodal input (i.e., video, text, audio). We first hypothesize that identifying the narrative structure of movies is a precondition for summarizing them. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie that define the narrative structure and determine its progression and thematic units. Therefore, we introduce the task of Turning Point (TP) identification and leverage it for movie summarization and trailer generation. Next, we propos...
Nikos Athanasiou
Chun-Hao Paul Huang
Perceiving Systems
Talk
Ye Yuan
18-01-2022
Unified Simulation, Perception, and Generation of Human Behavior
Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this talk, I will present a holistic approach to human behavior modeling and tackle its three essential aspects --- simulation, perception, and generation. I will show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects.
Since humans live in a physical world, we treat simulation as the foundation of our approach and start by developing a fundamental framework for representing human ...
Hongwei Yi
Perceiving Systems
Talk
Arianna Rampini
14-12-2021
Discrete inverse spectral geometry for shape analysis
Spectral quantities as the eigenvalues of the Laplacian operator are widely used in geometry processing since they provide a very informative summary of the intrinsic geometry of deformable shapes. Typically, the intrinsic properties of shapes are computed from their representation in 3D space and are used to encode compact geometric features, thus adopting a data-reduction principle. On the contrary, this talk focuses on the inverse problem: namely, recovering an extrinsic embedding from a purely intrinsic encoding, like in the classical “hearing the shape of the drum” problem. I will sta...
Silvia Zuffi
Perceiving Systems
Talk
Yajie Zhao
10-12-2021
Next Generation Lifelike Avatar Creation
High-fidelity avatar creation for films and games is tied with complex capture equipment, massive data, a long production cycle, and intensive manual labor by a production team. And it may still be in the notorious Uncanny Valley. In this talk, we will explore how to produce a lifelike avatar in a low-cost way. We will show how to leverage deep learning networks to accelerate and simplify the industrial avatar production procedure from data capturing to animation. And bring photorealism to the next level!
Timo Bolkart