Institute Talks
Special Talk: Verifiable Approaches to Trustworthy Machine Learning: Lessons from Researching Unlearning
- 12 March 2025 • 11:00—12:30
- Nicolas Papernot
- Tübingen
The talk presents open problems in the study of trustworthy machine learning. We begin by broadly characterizing the attack surface of modern machine learning algorithms. We then illustrate the challenges of having end user’s trust that machine learning algorithms were deployed responsibly, i.e., verify its trustworthiness, through a deep dive on the problem of unlearning. The need for machine unlearning, i.e., obtaining a model one would get without training on a subset of data, arises from privacy legislation and as a potential solution to data poisoning or copyright claims. As we present different approaches to unlearning, it becomes clear that they fail to answer the following question: how can end users verify that unlearning was successful? We show how an entity can claim plausible deniability when challenged about an unlearning request that was claimed to be processed, and conclude that at the level of model weights, being unlearnt is not always a well-defined property. Put another way, we find that unlearning is an algorithmic property. Taking a step back, we draw lessons for the broader area of trustworthy machine learning. In order for companies, regulators, and countries to verify meaningful properties at the scale that is required for stable governance of AI algorithms both nationally and internationally, our insight is that ML algorithms need to be co-designed with cryptographic protocols.
Organizers: Moritz Hardt Eva Laemmerhirt Nisha Tyagi
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
- 18 February 2025 • 16:00—17:00
- Ailing Zeng
- Virtual
High-quality video generation—encompassing text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) generation—plays a pivotal role in content creation and world simulation. While several DiT-based models have advanced rapidly in the past year, a thorough exploration of their capabilities, limitations, and alignment with human preferences remains incomplete. In this talk, I will present recent advancements in SORA-like T2V, I2V, and V2V models and products, bridging the gap between academic research and industry applications. Through live demonstrations and comparative analyses, I will highlight key insights across four core dimensions: i) Impact on vertical-domain applications, such as human-centric animation and robotics; ii) Core capabilities, including text alignment, motion diversity, composition, and stability; iii) Performance across ten real-world scenarios, showcasing practical utility; iv) Future potential, including usage scenarios, challenges, and directions for further research. Additionally, I will discuss recent advancements in automatic evaluation methods for generated videos, leveraging multimodal large language models to better adapt to the rapid development of generative and understanding models.
Organizers: Nikos Athanasiou Michael Black
Computer Vision at the Mirror Stage: Questioning and Refining Visual Categorization
- 06 February 2025 • 14:00—15:00
- Yannis Siglidis
Computer vision advancements in predicting and visualizing labels, often motivate us to consider the relationship between labels and images as a given. Yet, the prototypical nature of coherent labels, such as the alphabet of handwritten characters, can help us question assumed families of handwritten variation. At the same time conceptual categories such as the name of a country, if properly assigned to images, can provide a useful benchmark for state of the art computer vision models. Further, using synthesis methods these datasets can be mined to reveal patterns of hidden visual vocabularies that help improve our (geographical) data understanding. The goal of this talk is to motivate rethinking labels in a bidirectional way, aiming to create systems that inform how humans discretize their visual world.
Organizers: Nikos Athanasiou
Scene Understanding through Space and Time: Novel Priors for 3D Reconstruction and Physical Dynamics
- 03 February 2025 • 12:00—13:00
- Soumava Paul
- Virtual (Zoom)
This talk explores novel approaches to understanding and reconstructing scenes across both spatial and temporal dimensions. Extrapolating a scene from limited observations requires generative priors to generate 3D content in unobserved areas of a scene. Existing 3D generative literature relies on 3D-aware image or video diffusion models which require pretraining on million-scale real and synthetic 3D datasets. To address this challenge, we present low-cost generative techniques built on 2D diffusion priors that require only small-scale fine-tuning on multiview data. These finetuned priors can rectify novel view renders and depth maps by inpainting missing details and removing artifacts borne out of 3D representations fitted to sparse inputs. Through autoregressive fusion of multiple novel views, we build multiview consistent 3D representations that perform competitively with state-of-the-art methods for complex 360° scenes on the MipNeRF360 dataset. Building upon this foundation of static scene understanding, we extend our investigation to dynamic scenes where physical laws govern object interactions. While current video diffusion models like OpenAI's SoRA can generate visually compelling sequences, they often fail to capture underlying physical constraints due to their purely data-driven training objectives. As a result, the generated videos often lack physical plausibility. To address this limitation, we introduce a 4D dataset with per-frame force annotations that explicates the physical interactions driving object motion in scenes. Our physical simulator can both animate objects in static 3D scenes and record particle-level forces at each timestep. This dataset aims to enable the development of physics-informed video diffusion priors, marking a step toward more physically accurate world simulators.
Organizers: Omid Taheri
Capturing and Recognizing Multimodal Surface Interactions as Embedded High-Dimensional Distributions
- 15 January 2025 • 9:00—09:30
- Behnam Khojasteh
- "Lyapunov" room 2.255 at the University of Stuttgart
Exploring a surface with a handheld tool generates complex contact signals that uniquely encode the surface's properties—a needle hidden in a haystack of data. Humans naturally integrate visual, auditory, and haptic sensory data during these interactions to accurately assess and recognize surfaces. However, enabling artificial systems to perceive and recognize surfaces with human-like proficiency remains a significant challenge. The complexity and dimensionality of multi-modal sensor data, particularly in the intricate and dynamic modality of touch, hinders effective sensing and processing. Successfully overcoming these challenges will open up new possibilities in applications such as quality control, material documentation, and robotics. This dissertation addresses these issues at the levels of both the sensing hardware and the processing algorithms by introducing an automated similarity framework for multimodal surface recognition, developing a haptic-auditory test bed for acquiring high-quality surface data, and exploring optimal sensing configurations to improve recognition performance and robustness.
Organizers: Katherine Kuchenbecker Behnam Khojasteh
Precision Haptics in Gait Retraining for Knee Osteoarthritis
- 17 December 2024 • 15:00—16:00
- Nataliya Rokhmanova
- Zoom
Gait retraining, or teaching patients to walk in ways that reduce joint loading, shows promise as a conservative intervention for knee osteoarthritis. However, its use in clinical settings remains limited by challenges in prescribing optimal gait patterns and delivering precise, real-time biofeedback. This thesis presents four interconnected studies that aim to address these barriers to clinical adoption: First, a regression model was developed to predict patient-specific biomechanical responses to a gait modification using only simple clinical measures, reducing the need for instrumented gait analysis. Second, we identified how inertial sensor accuracy fundamentally impacts motor learning outcomes during gait retraining, demonstrating the importance of reliable kinematic tracking. Third, we designed and validated an open-source wearable haptic platform called ARIADNE, which delivers precise vibrotactile motion guidance and enables rigorous comparison of feedback strategies for gait retraining. This platform's integrated sensing revealed how anatomical placement and tissue properties influence vibration transmission and perception. Finally, a gait retraining study demonstrated that vibrotactile feedback significantly improves both learning and retention of therapeutic gait patterns compared to verbal instruction alone, highlighting the critical role of precise biofeedback systems in rehabilitation. These contributions help advance the field's understanding of the sensorimotor principles underlying gait retraining while providing practical tools to support future clinical implementation.
Organizers: Katherine Kuchenbecker Nataliya Rokhmanova
Next-Generation Biohybrids: Engineering Miniature Machines Inspired by Plant Systems
- 10 December 2024 • 11:00—12:00
- Dr. Isabella Fiorello
- Hybrid - Webex plus in-person attendance in Copper (2R04)
Among living organisms, plants are an ideal source of inspiration for robotics and engineering due to their remarkable evolutionary adaptations to almost every habitat. When miniaturized, plant-inspired machines can navigate confined and complex unstructured surfaces. We introduce a new class of plant-inspired, microfabricated hybrid machines designed for multifunctional tasks such as in situ monitoring and targeted cargo delivery. These machines combine bioinspired design with biohybrid approaches, incorporating the morphological and biomechanical features of both terrestrial and aquatic plants. Advanced techniques such as microcomputed tomography, two-photon lithography, and bioprinting enable the production of scalable and sustainable prototypes. Tested in real-world environments (such as soil, leaf tissues, and aquatic habitats) these machines have demonstrated their potential in applications like climbing robots, precision agriculture, reforestation, and underwater sensing. These technologies showcase the promise of plant-inspired biohybrid machines in environmental protection, conservation, and advanced engineering, with significant implications for fields such as material science, soft robotics, and precision agriculture.
Organizers: Katherine Kuchenbecker Christoph Keplinger
How to predict the inside from the outside? Segment, register, model and infer!
- 28 November 2024 • 10:00—11:00
- Sergi Pujades
- MPI IS Tuebingen, 3rd floor, Aquarium
Observing and modeling the human body has attracted scientific efforts since the very early times in history. In the recent decades, though, several imaging modalities, such as Computed Tomography scanners (CT), Magnetic Resonance Imaging (MRI), or X-ray have provided the means to “see” inside the body. Most interestingly, there is growing evidence pointing that the shape of the surface of the human body is highly correlated with its internal properties, for example, the body composition, the size of the bones, and the amount of muscle and adipose tissue (fat). In this talk I will go over the used methodology to establish the link between the shape of the surface of the body and the internal anatomic structures, based on the classical problems of segmentation, registration, statistical modeling, and inference.
Organizers: Marilyn Keller
Data-Driven Needle Puncture Detection for Urgent Medical Care Delivery in Space
- 23 October 2024 • 17:30—18:00
- Rachael L'Orsa
- Zoom
Needle decompression (ND) is a surgical procedure that treats one of the most preventable causes of trauma-related death: dangerous accumulations of air between the chest wall and the lungs. However, needle-tip overshoot of the target space can result in the inadvertent puncture of critical structures like the heart. This type of complication is fatal without urgent surgical care, which is not available in resource-poor environments like space. Since ND is done blind, operators rely on tool sensations to identify when the needle has reached its target. Needle instrumentation could enable puncture notifications to help operators limit tool-tip overshoot, but such a solution requires reliable puncture detection from manual (i.e., variable-velocity) needle insertion data streams. Data-driven puncture-detection (DDPD) algorithms are appropriate for this application, but their performance has historically been unacceptably low for use in safety-critical applications. We contribute towards the development of an intelligent device for manual ND assistance by proposing two novel DDPD algorithms. Three data sets are collected that provide needle forces, torques, and displacements during insertions into ex vivo porcine tissue analogs for the human chest, and factors affecting DDPD algorithm performance are analyzed in these data. Puncture event features are examined for each sensor, and the suitability of accelerometer measurements and diffuse reflectance is evaluated for ND. Finally, DDPD ensembles are proposed that yield a 5.1-fold improvement in precision as compared to the traditional force-only DDPD approach. These results lay a foundation for improving the urgent delivery of percutaneous procedures in space and other resource-poor settings.
Organizers: Katherine Kuchenbecker Rachael Lorsa
The Atomic Human: Understanding ourselves in the age of AI
- 17 October 2024 • 16:00—18:00
- Neil Lawrence
- Lecture Hall 2D5, Heisenbergstraße 1, Stuttgart
The Max Planck Institute for Intelligent Systems is delighted to invite you to its 2024 Max Planck Lecture in Stuttgart.
Organizers: Michael Black Barbara Kettemann Valeria Rojas
- Guy Tevet
- MPI-IS Tuebingen, N3.022
Character motion synthesis stands as a central challenge in computer animation and graphics. The successful adaptation of diffusion models to the field boosted synthesis quality and provided intuitive controls such as text and music. One of the earliest and most popular methods to do so is Motion Diffusion Model (MDM) [ICLR 2023]. In this talk, I will review how MDM incorporates domain know-how into the diffusion model and enables intuitive editing capabilities. Then, I will present two recent works, each suggesting a refreshing take on motion diffusion and extending its abilities to new animation tasks. Multi-view Ancestral Sampling (MAS) [CVPR 2024] is an inference time algorithm that samples 3D animations from 2D keypoint diffusion models. We demonstrated it by generating 3D animations for characters and scenarios that are challenging to record in elaborate motion capture systems, yet vastly ubiquitous on in-the-wild videos. These include for example horse racing and professional rhythmic gymnastics motions. Monkey See, Monkey Do (MoMo) [SIGGRAPH Asia 2024] explores the attention space of the motion diffusion model. A careful analysis shows the roles of the attention’s keys and queries through the generation process. With these findings in hand, we design a training-free method that generates motion following the distinct motifs of one motion while led by an outline dictated by another motion. To conclude the talk, I will give my modest take on the challenges in the fields and our lab’s current work attempting to tackle some of them.
Organizers: Omid Taheri