Research Overview

The problems studied in the department can be subsumed under the heading of empirical inference, i.e., inference performed on the basis of empirical data. This includes inductive learning (estimation of models such as functional dependencies that generalize to novel data sampled from the same underlying distribution), or the inference of causal structures from statistical data (leading to models that provide insight into the underlying mechanisms, and make predictions about the effect of interventions). Likewise, the type of empirical data can vary, ranging from biomedical measurements to astronomical observations. Our department is conducting theoretical, algorithmic, and experimental studies to advance study of empirical inference.
Causal mechanisms in the world give rise to statistical dependencies, but only the latter are exploited by today’s popular ML algorithms. Animate systems, in contrast, are often able to learn about underlying causal structures and mechanisms, traditionally considered harder to estimate. Such knowledge is useful by letting us predict not only future data coming from the same source, but also the effect of external interventions in a system, and by facilitating transfer of detected regularities to new situations generated by `natural' interventions.
While causal learning thus provides more insight into empirical regularities than a purely statistical approach, it poses difficult problems. For instance, while existing methods for estimating causal graphs using conditional independence testing have benefitted from our kernel tests for conditional independence, the basic (two-variable) problem of distinguishing cause and effect based on observational data was initially considered unsolvable, as nontrivial conditional independence statements require at least three variables. During the 15 years since we began exploring causality, we addressed (and in certain settings solved) this issue as well as related problems.
Our main interest, however, is the use of causality for (and its connections to) the foundations of machine learning. Most current ML systems implicitly make the `IID' assumption, implying that training and test data come from the same underlying distribution. We have argued that (1) the IID assumption is unrealistic when trying to build robust intelligent systems, and (2) causal models can address this by decomposing a joint distribution into (physical) mechanisms. The assumption that most (but not all) of these mechanisms remain invariant (`sparse mechanism shift') constitutes a minimal relaxation of IID.
Much of our work is based upon the `independent causal mechanisms' postulate formalizing the (physical) independence of mechanisms generating the data. This postulate implies a surprising link between causal direction and feasibility of certain machine learning approaches: semi-supervised learning is possible only for anticausal (or confounded) learning problems, whereas covariate shift adaptation and transfer is feasible for causal problems (i.e., where the task it to predict effect from cause). This constituted arguably the first such link between causal structure and machine learning, and it initiated the study of invariance for causality in the machine learning community. It won the ten-year test-of-time honourable mention at ICML 2022, a first for a causality paper at a mainstream machine learning conference.
What is more, we are beginning to understand how causal machine learning can actually benefit from distribution shifts, since they allow extraction of invariant mechanisms supporting generalizing to novel distributions (i.e., `out-of-distribution' or OOD), as well as of causal structure invariant across shifts. We recently gave an elegant formulation for the assumption of independent causal mechanisms and its connection to distribution shifts via a Causal de Finetti Theorem []. This shows that non-IID data satisfying an assumption of exchangeability and independent mechanisms permits identification of causal structure from observational data in cases where this is impossible based on mere IID data.
Another tacit assumption of standard machine learning is that we always get to observe samples from the joint distribution of a system. It turns out that a causal approach can help facilitate the combination of partial models of different marginal distributions into an overall model, a problem we refer to as `out-of-variable' (OOV) generalization. We hypothesize that our examples of OOD and OOV generalization are just the tip of the iceberg, and we plan to continue the study of how causality can benefit machine learning.
Causal approaches often start with the assumptions that the causal variables be known a priori. In practice, this need not be the case — e.g., interventions on a visual scene will affect many pixels simultaneously, and it is nontrivial to identify intervenable elements or objects in an image. We started looking at questions of causal abstractions in the last decade, and have more recently initiated the study of causal representation learning. In contrast to statistical representations that focus on preserving statistical information, a causal abstraction or representation additionally needs to represent actions, and preserve the interventional semantics of actions (e.g., expressed through a commutative diagram).
We thus aim to move from statistical representations towards learning causal world models or Causal Digital Twins, a program summarized in a paper written to accompany an invited talk at the International Congress of Mathematicians [].
With the developments in Large Language Models (LLMs) and other foundation models, another thread of causal work in our department has recently emerged. Rather than trying to start from causal insights and facilitate or improve machine learning methods, we analyze whether and to which extent existing models automatically extract causal knowledge from the training set even if causality is not built into the training methods. Amongst other methods, this led to the CLadder evaluation [] of LLMs, exhibiting major shortcomings on causal reasoning tasks not contained in the training data. We have begun to study similar shortcomings for the case of visual reasoning.
While our work often starts with theoretical foundations, we also aim to take it all the way to practical impact. Here, we take a particular interest in astronomical applications. Our causal algorithm for exoplanet transit signal denoising, with collaborators from astronomy, led to the discovery of an exoplanet (K2-18b) that was subsequently analyzed using the Hubble and Webb space telescopes; with the recent discovery of carbon dioxide and methane. K2-18b engendered the notion of a `hycean planet' with abundant liquid water and a hydrogen envelope, constituting a tantalizing object for astrobiology. We have transferred our causal methods also to direct exoplanet imaging with exciting results, and intend to continue our activities in this area.
A second thread of our research in astronomy, with collaborators from the Albert Einstein Institute, is using modern machine learning to characterize gravitational wave events. Using normalizing flows and diffusion methods, we infer posterior distributions over the physical parameters of black holes or neutron stars involved in a cataclysmic merger event. Crucially, our inference only takes seconds and includes the sky position, which in the future will allow fast electromagnetic follow-up observations of certain events. Our work has led to several publications in the flagship journal of physics, including the recent [] and a paper to appear in Nature. We have transferred the methods to exoplanet imaging, leading to probabilistic algorithms for inferring atmospheric parameters based on spectral measurements [
].