Events & Talks

Perceiving Systems Talk Robin Courant 10-03-2025 How and what to film in virtual environments? Content creation for movies and video games has been transformed with the rise of virtual environments, yet filming within these digital worlds remains a complex challenge. This talk explores the question: how and what to film in virtual environments? We examine the role of camera control and human interaction across different virtual settings, including NeRF, 3D engines, and video generation. Victoria Fernandez Abrevaya

Perceiving Systems Talk Ailing Zeng 18-02-2025 The Dawn of Video Generation: Preliminary Explorations with SORA-like Models High-quality video generation—encompassing text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) generation—plays a pivotal role in content creation and world simulation. While several DiT-based models have advanced rapidly in the past year, a thorough exploration of their capabilities, limitations, and alignment with human preferences remains incomplete. In this talk, I will present recent advancements in SORA-like T2V, I2V, and V2V models and products, bridging the gap between academic research and industry applications. Through live demonstrations and comparative analyses, ... Nikos Athanasiou Michael Black

Perceiving Systems Talk Yannis Siglidis 06-02-2025 Computer Vision at the Mirror Stage: Questioning and Refining Visual Categorization Computer vision advancements in predicting and visualizing labels, often motivate us to consider the relationship between labels and images as a given. Yet, the prototypical nature of coherent labels, such as the alphabet of handwritten characters, can help us question assumed families of handwritten variation. Nikos Athanasiou

Perceiving Systems Talk Sergi Pujades 28-11-2024 How to predict the inside from the outside? Segment, register, model and infer! Observing and modeling the human body has attracted scientific efforts since the very early times in history. In the recent decades, though, several imaging modalities, such as Computed Tomography scanners (CT), Magnetic Resonance Imaging (MRI), or X-ray have provided the means to “see” inside the body. Most interestingly, there is growing evidence pointing that the shape of the surface of the human body is highly correlated with its internal properties, for example, the body composition, the size of the bones, and the amount of muscle and adipose tissue (fat). In this talk I will go over ... Marilyn Keller

Perceiving Systems Talk Guy Tevet 14-10-2024 Diffusion Models for Human Motion Synthesis Character motion synthesis stands as a central challenge in computer animation and graphics. The successful adaptation of diffusion models to the field boosted synthesis quality and provided intuitive controls such as text and music. One of the earliest and most popular methods to do so is Motion Diffusion Model (MDM) [ICLR 2023]. In this talk, I will review how MDM incorporates domain know-how into the diffusion model and enables intuitive editing capabilities. Then, I will present two recent works, each suggesting a refreshing take on motion diffusion and extending its abilities to new... Omid Taheri

Perceiving Systems Talk Egor Zakharov 10-10-2024 Reconstruction and Animation of Realistic Head Avatars Digital humans, or realistic avatars, are a centerpiece of future telepresence and special effects systems, and human head modeling is one of their main components. The abovementioned applications, however, are highly demanding in terms of avatar creation speed, as well as realism, and controllability. This talk will focus on the approaches that create controllable and detailed 3D head avatars using the data from consumer-grade devices, such as smartphones, in an uncalibrated and unconstrained capture setting. We will discuss leveraging in-the-wild internet videos and synthetic data sources... Vanessa Sklyarova

Perceiving Systems Talk Simon Donne 26-09-2024 Collaborative Control for Geometry-Conditioned PBR Image Generation Current diffusion models only generate RGB images. If we want to make progress towards graphics-ready 3D content generation, we need a PBR foundation model, but there is not enough PBR data available to train such a model from scratch. We introduce Collaborative Control, which tightly links a new PBR diffusion model to a pre-trained RGB model. We show that this dual architecture does not risk catastrophic forgetting, outputting high-quality PBR images and generalizing well beyond the PBR training dataset. Furthermore, the frozen base model remains compatible with techniques such as IP-Adapter. Soubhik Sanyal

Perceiving Systems Talk Slava Elizarov 26-09-2024 Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation In this talk, I will present Geometry Image Diffusion (GIMDiffusion), a novel method designed to generate 3D objects from text prompts efficiently. GIMDiffusion uses geometry images, a 2D representation of 3D shapes, which allows the use of existing image-based architectures instead of complex 3D-aware models. This approach reduces computational costs and simplifies the model design. By incorporating Collaborative Control, the method exploits rich priors of pretrained Text-to-Image models like Stable Diffusion, enabling strong generalization even with limited 3D training data. GIMDiffusion ... Soubhik Sanyal

Perceiving Systems Talk Wanyue Zhang 12-09-2024 Generalizable Object-aware Human Motion Synthesis Data-driven virtual 3D character animation has recently witnessed remarkable progress. The realism of virtual characters is a core contributing factor to the quality of computer animations and user experience in immersive applications like games, movies, and VR/AR. However, existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalize well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. In this talk, I will present ROAM, an alternat... Nikos Athanasiou

Perceiving Systems Talk István Sárándi 22-08-2024 Real Virtual Humans With the explosive growth of available training data, 3D human pose and shape estimation is ahead of a transition to a data-centric paradigm. To leverage data scale, we need flexible models trainable from heterogeneous data sources. To this end, our latest work, Neural Localizer Fields, seamlessly unifies different human pose and shape-related tasks and datasets though the ability - both at training and test time - to query any arbitrary point of the human volume, and obtain its estimated location in 3D, based on a single RGB image. We achieve this by learning a continuous neural field of b... Marilyn Keller

Perceiving Systems Talk Jiawei Liu 25-07-2024 4D Dynamic Scene Reconstruction, Editing, and Generation. People live in a 4D dynamic moving world. While videos serve as the most convenient medium to capture this dynamic world, they lack the capability to present the 4D nature of our world. Therefore, 4D video reconstruction, free-viewpoint rendering, and high-quality editing and generation offer innovative opportunities for content creation, virtual reality, telepresence, and robotics. Although promising, they also pose significant challenges in terms of efficiency, 4D motion and dynamics, temporal and subject consistency, and text-3D/video alignment. In light of these challenges, this talk wi... Omid Taheri

Perceiving Systems Talk Angelica Lim 23-07-2024 Multimodal Social Signal Processing for Human-Robot Interaction Science fiction has long promised us interfaces and robots that interact with us as smoothly as humans do - Rosie the Robot from The Jetsons, C-3PO from Star Wars, and Samantha from Her. Today, interactive robots and voice user interfaces are moving us closer to effortless, human-like interactions in the real world. In this talk, I will discuss the opportunities and challenges in finely analyzing, detecting and generating non-verbal communication in context, including gestures, gaze, auditory signals, and facial expressions. Specifically, I will discuss how we might allow robots and virtual... Yao Feng Michael Black

Perceiving Systems Talk Siheng Chen 18-07-2024 Integrating AI Agents into Human Lives via a Simulation Approach As the rapid growth of AI techniques, we might witness the emergence of AI agents entering our lives, reminiscent of new species. Ensuring these AI agents can well integrate into human life would be a profounding challenge. We urge these agents to be highly performant, safe, and well-aligned with human values. However, directly training and testing AI agents in real-world environments to guarantee their performance and safety is costly and can disrupt everyday life. Thus, we are exploring a simulation-based approach to incubate these AI agents. In this talk, we will highlight the role of si... Yao Feng

Perceiving Systems Talk Boxiang Rong 18-07-2024 Recreating Real Garments in Virtual Space with Gaussian Splatting and GNNs Recent advances in scene reconstruction with 3D Gaussian Splatting and cloth simulation with Graph neural networks open the prospects for methods that reconstruct proto-realistic virtual garments from visual observations. In this talk we will present our recently submitted paper – Gaussian Garments. There we reconstruct simulation ready photorealistic garments from multi-view videos. With the power of 3D Gaussian Splatting we are able to match three key aspects of real garments in virtual space: their geometry, appearance and behavior. The resulting virtual garments can then be combined int... Artur Grigorev

Perceiving Systems Talk Yafes Sahin 08-07-2024 Creating High-End Visuals with Real-Time Technology Creating captivating 3D visuals, particularly photorealistic CGI, demands a diverse range of tools, techniques, and expertise, from concept design to the creation of entire 3D worlds. Linear content generation represents the highest standard of visual quality and has long been a source of inspiration for game developers. In this talk, we will explore the advancements in techniques that have contributed to the rise of real-time technologies in movies and game cinematics. We will delve into projects created with Unreal Engine, such as The Matrix Awakens, Vaulted Halls Entombed (Netflix S... Yao Feng

Perceiving Systems Talk Pranav Manu 04-07-2024 Text-Driven 3D Modeling of Avatars Generating 3D objects poses notable challenges due to the limited availability of annotated 3D datasets, unlike their 2D counterparts. Current approaches often resort to models trained on 2D data, resulting in prolonged optimization phases. Conversely, models trained on 3D datasets enable inference without optimization but suffer from limited dataset diversity. This talk explores methodologies for generative 3D modelling of human heads and garments, pivotal for human avatar creation. First, we introduce "Clip-Head," a text-to-textured 3D head generation model that generates a textured NPHM ... Victoria Fernandez Abrevaya

Perceiving Systems Talk Shixiang Tang 10-06-2024 Towards Human-Centric Foundation Models: Pretraining Datasets and Unified Architectures Recent years have witnessed great research interests in Human-Centric Visual Computing, such as person re-identification in social surveillance, mesh recovery in Metaverse, and pedestrian detection in autonomous driving. The recent development of large model offers the opportunity to unify these human-centric tasks and achieve improved performance by merging public datasets from different tasks. This talk will present our recent work on developing human-centric unified models on 2D vision, 3D vision, Skelton-based and vision-language tasks. We hope our model will be integrated to the curre... Yandong Wen

Perceiving Systems Talk Shengqu Cai 02-05-2024 Generative Rendering and Beyond Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models (SORA). Despite great promise, video diffusion models are difficult to control, hindering users from applying their own creativity rather than amplifying it. In this talk, we present a novel approach called Generative Rendering that combines the controllability of dynamic 3D me... Shrisha Bharadwaj Michael Black

Perceiving Systems Talk Maria Korosteleva 04-04-2024 Modeling and Reconstructing Garments with Sewing Patterns The problems of creating new garments (modeling) or reproducing the existing ones (reconstruction) appear in various fields: from fashion production to digital human modeling for the metaverse. The talk introduces approaches to a novel garment creation paradigm: programming-based parametric sewing pattern construction and its application to generating rich synthetic datasets of garments with sewing patterns. We will then discuss how the availability of ground truth sewing patterns allows posing the learning-based garment reconstruction problem as a sewing pattern recovery. Such reformulatio... Yao Feng Michael Black

Perceiving Systems Talk Qixing Huang 13-03-2024 Geometric Regularizations for 3D Shape Generation Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference be... Yuliang Xiu

Perceiving Systems Talk Luming Tang 18-01-2024 Mining Visual Knowledge from Large Pre-trained Models Computer vision made huge progress in the past decade with the dominant supervised learning paradigm, that is training large-scale neural networks on each task with ever larger datasets. However, in many cases, scalable data or annotation collection is intractable. In contrast, humans can easily adapt to new vision tasks with very little data or labels. In order to bridge this gap, we found that there actually exists rich visual knowledge in large pre-trained models, i.e., models trained on scalable internet images with either self-supervised or generative objectives. And we proposed differ... Yuliang Xiu Yandong Wen

Perceiving Systems Talk Partha Ghosh 30-11-2023 RAVEN: Rethinking Adversarial Video generation with Efficient tri-plane Networks We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity b... Yandong Wen

Perceiving Systems Talk Weiyang Liu 19-10-2023 Orthogonal Butterfly: Parameter-Efficient Orthogonal Adaptation of Foundation Models via Butterfly Factorization Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a ... Yandong Wen

Perceiving Systems Talk Zhen Liu 12-10-2023 Ghost on the Shell: An Expressive Representation of General 3D Shapes The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they enable 1) fast physics-based rendering with realistic material and lighting, 2) physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open,... Yandong Wen

Perceiving Systems Talk Claudia Gallatz 17-08-2023 Face Exploration - Capture all Degrees of Freedom of the Face A high quality data capture is decisive for your scientific work. As a member of the data team, it is a core task of my daily routine to ensure good quality standards in this field. My talk will enlighten the background of this work, starting from scanner set-up and the corresponding data outcome with focus on the Face Scanner. A work, each scientist can profit from for his personal projects. I will take the occasion to present our most recent face capture study named FACE EXPLORATION, of which Timo Bolkart is the leading scientist. A selection of representative sequences including facial m... Yandong Wen

Perceiving Systems Talk Yangyi Huang 13-07-2023 Full-body avatars from single images and textual guidance The reconstruction of full body appearance of clothed humans from single-view RGB images is a crucial yet challenging task, primarily due to depth ambiguities and the absence of observations from unseen regions. While existing methods have shown impressive results, they still suffer from limitations such as over-smooth surfaces and blurry textures, particularly lacking details at the backside of the avatar. In this talk, I will delve into how we have addressed these limitations by leveraging text guidance and pretrained text-image models, introducing two novel methods. Firstly, I will prese... Hongwei Yi

Perceiving Systems Talk Bian Siyuan 13-04-2023 Pose, Kinematics, and Dynamics Recovering accurate 3D human pose and shape from monocular input remains a challenging problem despite the rapid advancements powered by deep neural networks. Existing methods have limitations in achieving both robustness and mesh-image alignment, and the estimated pose suffers from physical artifacts such as foot sliding and body leaning. In this talk, we present two new methods to address these limitations. Firstly, we introduce NIKI, an inverse kinematics algorithm that utilizes an invertible neural network to model both the forward kinematics process and the inverse kinematics process. ... Michael Black

Perceiving Systems Talk Lisa Dunlap 29-03-2023 Language is the key to robust vision systems The ability to extend a model beyond the domain of the training data is central to building robust computer vision models. Methods for dealing with unseen test distributions or biased training data often require leveraging additional image data, but linguistic knowledge of the task and potential domain shifts is much cheaper and easier to obtain. In this talk, I will present three recent works that focus on different ways one can improve accuracy with language advice and incomplete training data via large-scale vision and language models. Lea Müller

Perceiving Systems Talk Anurag Ranjan 23-02-2023 Neural Graphics in a Generative World Recent years have seen significant advancements in deep learning, which has led to a growing belief that Moore's law, which traditionally pertained to the packing of transistors, is now transitioning towards the improvement of photo-realistic 3D graphics. The advancements in this research field can be broadly categorized into two areas: neural fields, which are capable of modeling photo-realistic 3D representations, and diffusion models, which are able to generalize to large scale data and produce photo-realistic images. To combine these technologies for large scale 3D generative modeling, ... Sai Kumar Dwivedi

Perceiving Systems Talk Xi Wang 16-02-2023 What do language models tell us about human-object interaction? Research in artificial intelligence (AI) continues to advance quickly and outperforms humans in many tasks, making its way into our daily lives. However, beneath their superior performance, current technologies, limited in how to perceive, process, and understand our visual world, struggle with understanding and interacting with people. These issues raise the core question of my research: How do we build intelligent systems that can interact with people and offer assistance in a natural and seamless way? In this talk, I will present our recent works on using the CLIP model for object intera... Muhammed Kocabas

Perceiving Systems Talk Mingyuan Zhang 19-01-2023 Human Motion Generation with Diffusion Models Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages, speech, and music. However, it remains challenging to achieve diverse and fine-grained motion generation with comprehensive condition signals. Inspired by the success in image generation, recent works attempt to apply diffusion models to motion generation tasks (Motion Diffusion Models) and achieve impressive progress in as... Shashank Tripathi

Perceiving Systems Talk Zhongang Cai 12-01-2023 Data Infrastructure for Scaling up Human Understanding and Modelling to the Real World Human sensing and modelling are fundamental tasks in vision and graphics with numerous applications. However, due to the prohibitive cost, existing datasets are often limited in scale and diversity. This talk shares two of our recent works to tackle data scarcity. First, with the advances of new sensors and algorithms, paired data can be obtained from an inexpensive set-up and an automatic annotation pipeline. Specifically, we demonstrate the data collection solution by introducing HuMMan, a large-scale multimodal 4D human dataset. HuMMan has several appealing properties: 1) multimodal data... Shashank Tripathi

Perceiving Systems Talk Yuge Shi 22-09-2022 Combine and conquer: representation learning from multiple data distributions It is becoming less and less controversial to say that the days of learning representations through label supervision are over. Recent work discovers that such regimes are not only expensive, but also suffer from various generalisation/robustness issues. This is somewhat unsurprising, as perceptual data (vision, language) are rich and cannot be well represented by a single label --- doing so inevitably result in the model learning spurious features that trivially correlates to the label. In this talk, I will introduce my work during my PhD at Oxford, which looks at representation learning... Yao Feng

Perceiving Systems Talk Alejandro Pardo 08-09-2022 Computer Vision for Automated Video Editing and Understanding. Video content creation has boomed in recent years. Every day hundreds of thousands of video hours are uploaded to the internet. Thus, video content editing has become more popular and accessible to amateur users. However, current Computer Vision (CV) techniques have not studied technologies to help video editing become a less tedious task. Currently, editors spend hours cutting and stitching videos to deliver final edited videos that convey stories. This cutting process is creative but is often repetitive. With the recent advances in CV, one would expect that a system could learn some cutti... Hongwei Yi

Perceiving Systems Talk Zenghao Chai 04-08-2022 REALY: Rethinking the Evaluation of 3D Face Reconstruction The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluation approach with a new benchmark REALY, consisting of 100 globally aligned face scans with accurate facial keypoints, high-quality region masks, and topology-consistent meshes. Our approach perform... Yandong Wen

Perceiving Systems Talk Lingchen Yang 28-07-2022 Implicit Neural Representation for Physics-driven Actuated Soft Bodies Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discre... Yao Feng

Perceiving Systems Talk Supreeth Narasimhaswamy 28-07-2022 Understanding Human Hands in Visual Data Hands are the central means by which humans interact with their surroundings. Understanding human hands help human behavior analysis and facilitate other visual analysis tasks such as action and gesture recognition. Recently, there has been a surge of interest in understanding first-person visual data, and hands are the dominant interaction entities in such activities. Also, there is an explosion of interest in developing computer vision methods for augmented and virtual reality. To deliver an authentic augmented and virtual reality experience, we need to enable humans to interact with the ... Sai Kumar Dwivedi Dimitris Tzionas

Perceiving Systems Talk Michael Zollhoefer 27-07-2022 Complete Codec Telepresence Imagine two people, each of them within their own home, being able to communicate and interact virtually with each other as if they are both present in the same shared physical space. Enabling such an experience, i.e., building a telepresence system that is indistinguishable from reality, is one of the goals of Reality Labs Research (RLR) in Pittsburgh. To this end, we develop key technology that combines fundamental computer vision, machine learning, and graphics techniques based on a novel neural reconstruction and rendering paradigm. In this talk, I will cover our advances towards a neur... Yao Feng

Perceiving Systems Talk Rana Hanocka 13-06-2022 Shape editing, generation, and stylization Manual authoring of 3D content is a laborious and tedious task. In this talk, I present some of 3DL's recent and on-going efforts toward building tools which provide intuitive control for editing, manipulating, and generating 3D shapes. I will discuss how recent advancements, such as joint vision-language embedding spaces can be used to stylize 3D objects, driven by natural language. Finally, I will conclude with ongoing and future work in this direction, as well as other related areas. Omid Taheri

Perceiving Systems Talk Youngjoong Kwon 09-06-2022 Learning to create Digital Humans: Generalizable Radiance Fields for Human Performance Rendering In this work, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human, In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose a no... Yuliang Xiu

Perceiving Systems Talk Jiashi Feng 02-05-2022 Learning to estimate 3D human poses without labeled data Estimating 3D human poses from images or videos is a fundamental task in computer vision. However, the limitation of training data with high-quality 3D pose annotations largely hinder its development and deployment in real applications. In this talk, I will introduce our recent works on training 3D pose estimation models without requiring 3D labeled data. Our first step is to present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator. Specifically, P... Michael Black

Perceiving Systems Talk Lixin Yang 25-04-2022 Leverage Kinematic and Contact constraints for understanding hand-object interaction My works focus on inferring and understanding the human hand’s interaction with objects from visual inputs, which include several tasks like pose estimation, grasping pose generation, and interacting pose transfer. Unlike the single-body pose estimation task, understanding the Hand-object (multi-bodies) interactions in 3D spaces is more challenging, due to its high degree of articulations, the projection ambiguity, self or mutual occlusions, and the complicated physical constraints. Designing algorithms to tackle these challenges is my goal. We find that the mutual contact can provide rich ... Yuliang Xiu

Perceiving Systems Talk Chunlu Li 19-04-2022 Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision 3D face reconstruction under occlusions is highly challenging because of the large variability of the appearance and location of occluders. Currently, the most successful methods fit a 3D face model through inverse rendering and assume a given segmentation of the occluder to avoid fitting the occluder. However, the segmentation annotations are costly since training an occlusion segmentation model requires large amounts of annotated data. To overcome this, we introduce a model-based approach for 3D face reconstruction that is highly robust to occlusions but does not require any occlusion ann... Victoria Fernandez Abrevaya

Perceiving Systems Talk Angela Yao 12-04-2022 Mixing Synthetic and Real-World Captures for RGB Hand Pose Estimation How can we learn models for hand pose estimation without any (real-world) labels? This talk presents our recent efforts in tackling the challenging scenario of learning from labelled synthetic data and unlabelled real-world data. I will focus on two strategies that we find to be effective: (1) cross-modal consistency and alignment for representation learning and (2) pseudo-label corrections and refinement. The second part of the talk will introduce Assembly101, our newly recorded dataset that tackles 3D hand pose and action understanding over time. Assembly101 is a new procedural activit... Dimitris Tzionas

Perceiving Systems Talk Henry Clever 07-04-2022 Modeling Humans at Rest with Applications to Robotic Assistance Humans spend a large part of their lives resting. Machine perception of this class of body poses would be beneficial to numerous applications, but it is complicated by line-of-sight occlusion from bedding. Pressure sensing mats are a promising alternative, but data is challenging to collect at scale. To overcome this, we use modern physics engines to simulate bodies resting on a soft bed with a pressure sensing mat. This method can efficiently generate data at scale for training deep neural networks. We present a deep model trained on this data that infers 3D human pose and body shape from ... Dimitris Tzionas Chun-Hao Paul Huang

Perceiving Systems Talk Sida Peng 07-04-2022 Reconstructing Static Scenes and Dynamic Humans with Implicit Neural Representations 3D reconstruction is a long-standing problem in computer vision and has a variety of applications such as virtual reality, 3D content generation, and telepresence. In this talk, I will present our progress on 3D reconstruction of static scenes and dynamic humans with implicit neural representations. The first part of the talk introduces an effective regularization when optimizing implicit neural representations on indoor scenes based on the Manhattan-world Assumption. In the second part, I will show some animatable implicit neural representations for modeling dynamic humans from videos. Hongwei Yi

Perceiving Systems Talk Pinelopi Papalampidi 08-02-2022 Structure-aware Narrative Understanding and Summarization In this work, we analyze and summarize full-length movies from multimodal input (i.e., video, text, audio). We first hypothesize that identifying the narrative structure of movies is a precondition for summarizing them. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie that define the narrative structure and determine its progression and thematic units. Therefore, we introduce the task of Turning Point (TP) identification and leverage it for movie summarization and trailer generation. Next, we propos... Nikos Athanasiou Chun-Hao Paul Huang

Perceiving Systems Talk Ye Yuan 18-01-2022 Unified Simulation, Perception, and Generation of Human Behavior Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this talk, I will present a holistic approach to human behavior modeling and tackle its three essential aspects --- simulation, perception, and generation. I will show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. Since humans live in a physical world, we treat simulation as the foundation of our approach and start by developing a fundamental framework for representing human ... Hongwei Yi

Perceiving Systems Talk Arianna Rampini 14-12-2021 Discrete inverse spectral geometry for shape analysis Spectral quantities as the eigenvalues of the Laplacian operator are widely used in geometry processing since they provide a very informative summary of the intrinsic geometry of deformable shapes. Typically, the intrinsic properties of shapes are computed from their representation in 3D space and are used to encode compact geometric features, thus adopting a data-reduction principle. On the contrary, this talk focuses on the inverse problem: namely, recovering an extrinsic embedding from a purely intrinsic encoding, like in the classical “hearing the shape of the drum” problem. I will sta... Silvia Zuffi

Perceiving Systems Talk Yajie Zhao 10-12-2021 Next Generation Lifelike Avatar Creation High-fidelity avatar creation for films and games is tied with complex capture equipment, massive data, a long production cycle, and intensive manual labor by a production team. And it may still be in the notorious Uncanny Valley. In this talk, we will explore how to produce a lifelike avatar in a low-cost way. We will show how to leverage deep learning networks to accelerate and simplify the industrial avatar production procedure from data capturing to animation. And bring photorealism to the next level! Timo Bolkart