Thu, 15 Feb 2024

16:00 - 17:00
Virtual

From Lévy's stochastic area formula to universality of affine and polynomial processes via signature SDEs

Christa Cuchiero
(University of Vienna)
Further Information
Abstract

A plethora of stochastic models used in particular in mathematical finance, but also population genetics and physics, stems from the class of affine and polynomial processes. The history of these processes is on the one hand closely connected with the important concept of tractability, that is a substantial reduction of computational efforts due to special structural features, and on the other hand with a unifying framework for a large number of probabilistic models. One early instance in the literature where this unifying affine and polynomial point of view can be applied is Lévy's stochastic area formula. Starting from this example,  we present a guided tour through the main properties and recent results, which lead to signature stochastic differential equations (SDEs). They constitute a large class of stochastic processes, here driven by Brownian motions, whose characteristics are entire or real-analytic functions of their own signature, i.e. of iterated integrals of the process with itself, and allow therefore for a generic path dependence. We show that their prolongation with the corresponding signature is an affine and polynomial process taking values in subsets of group-like elements of the extended tensor algebra. Signature SDEs are thus a class of stochastic processes, which is universal within Itô processes with path-dependent characteristics and which allows - due to the affine theory - for a relatively explicit characterization of the Fourier-Laplace transform and hence the full law on path space.

Fri, 01 Dec 2023

14:00 - 15:00
Virtual

Sequence models in biomedicine: from predicting disease progression to genome editing outcomes

Professor Michael Krauthammer
(Department of Quantitative Biomedicine University of Zurich)
Abstract

Sequential biomedical data is ubiquitous, from time-resolved data about patient encounters in the clinical realm to DNA sequences in the biological domain.  The talk will review our latest work in representation learning from longitudinal data, with a particular focus on finding optimal representations for complex and sparse healthcare data. We show how these representations are useful for comparing patient journeys and finding patients with similar health outcomes. We will also venture into the field of genome engineering, where we build models that work on DNA sequences for predicting editing outcomes for base and prime editors. 

Fri, 17 Nov 2023

14:00 - 15:00
Virtual

The generalist medical AI will see you now

Professor Pranav Rajpurkar
(Department of Biomedical Informatics Harvard Medical School Boston)
Abstract

Accurate interpretation of medical images is crucial for disease diagnosis and treatment, and AI has the potential to minimize errors, reduce delays, and improve accessibility. The focal point of this presentation lies in a grand ambition: the development of 'Generalist Medical AI' systems that can closely resemble doctors in their ability to reason through a wide range of medical tasks, incorporate multiple data modalities, and communicate in natural language. Starting with pioneering algorithms that have already demonstrated their potential in diagnosing diseases from chest X-rays or electrocardiograms, matching the proficiency of expert radiologists and cardiologists, I will delve into the core challenges and advancements in the field. The discussion will navigate towards the topic of label-efficient AI models: with a scarcity of meticulously annotated data in healthcare, the development of AI systems capable of learning effectively from limited labels has become a key concern. In this vein, I'll delve into how the innovative use of self-supervision and pre-training methods has led to algorithmic advancements that can perform high-level diagnostic tasks using significantly less annotated data. Additionally, I will talk about initiatives in data curation, human-AI collaboration, and the creation of open benchmarks to evaluate the generalizability of medical AI algorithms. In sum, this talk aims to deliver a comprehensive picture of the state of 'Generalist Medical AI,' the advancements made, the challenges faced, and the prospects lying ahead.

Fri, 20 Oct 2023

15:00 - 16:00
Virtual

Machine learning for identifying translatable biomarkers and targets

Professor Daphne Koller
(Department of Computer Science Stanford University)
Abstract

Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that we simply don't understand the underlying biology of disease, and which interventions might meaningfully modulate clinical outcomes and in which patients. To achieve this goal, we are bringing together large amounts of high content data, taken both from humans and from human-derived cellular systems generated in our own lab. Those are then used to learn a meaningful representation of biological states via cutting edge machine learning methods, which enable us to make predictions about novel targets, coherent patient segments, and the clinical effect of molecules. Our ultimate goal is to develop a new approach to drug development that uses high-quality data and ML models to design novel, safe, and effective therapies that help more people, faster, and at a lower cost. 

Tue, 06 Jun 2023

17:00 - 18:00
Virtual

The Critical Beta-splitting Random Tree

David Aldous
(U.C. Berkeley and University of Washington)
Further Information

Part of the Oxford Discrete Maths and Probability Seminar, held via Zoom. Please see the seminar website for details.

Abstract

In the critical beta-splitting model of a random $n$-leaf rooted tree, clades (subtrees) are recursively split into sub-clades, and a clade of $m$ leaves is split into sub-clades containing $i$ and $m-i$ leaves with probabilities $\propto 1/(i(m-i))$. This model turns out to have interesting properties. There is a canonical embedding into a continuous-time model ($\operatorname{CTCS}(n)$). There is an inductive construction of $\operatorname{CTCS}(n)$ as $n$ increases, analogous to the stick-breaking constructions of the uniform random tree and its limit continuum random tree. We study the heights of leaves and the limit fringe distribution relative to a random leaf. In addition to familiar probabilistic methods, there are analytic methods (developed by co-author Boris Pittel), based on explicit recurrences, which often give more precise results. So this model provides an interesting concrete setting in which to compare and contrast these methods. Many open problems remain.
Preprints at https://arxiv.org/abs/2302.05066 and https://arxiv.org/abs/2303.02529

Tue, 06 Jun 2023

15:30 - 16:30
Virtual

The Metropolis Algorithm for the Planted Clique Problem

Elchanan Mossel
(MIT)
Further Information

Part of the Oxford Discrete Maths and Probability Seminar, held via Zoom. Please see the seminar website for details.

Abstract

More than 30 year ago Jerrum studied the planted clique problem and proved that under worst-case initialization Metropolis fails to recover planted cliques of size $\ll n^{1/2}$ in the Erdős-Rényi graph $G(n,1/2)$. This result is classically cited in the literature of the problem, as the "first evidence" that finding planted cliques of size much smaller than square root $n$ is "algorithmically hard". Cliques of size $\gg n^{1/2}$ are easy to find using simple algorithms. In a recent work we show that the Metropolis process actually fails to find planted cliques under worst-case initialization for cliques up to size almost linear in $n$. Thus the algorithm fails well beyond the $\sqrt{n}$ "conjectured algorithmic threshold". We also prove that, for a large parameter regime, that the Metropolis process fails also under "natural initialization". Our results resolve some open questions posed by Jerrum in 1992. Based on joint work with Zongchen Chen and Iias Zadik.

Fri, 19 May 2023

14:00 - 15:00
Virtual

Mapping and navigating biology and chemistry with genome-scale imaging

Dr Imran Haque
(Recursion Pharmaceuticals)
Abstract

 

Image-based readouts of biology are information-rich and inexpensive. Yet historically, bespoke data collection methods and the intrinsically unstructured nature of image data have made these assays difficult to work with at scale. This presentation will discuss advances made at Recursion to industrialise the use of cellular imaging to decode biology and drive drug discovery. First, the use of deep learning allows the transformation of unstructured images into biologically meaningful representations, enabling a 'map of biology' relating genetic and chemical perturbations to scale drug discovery. Second, building such a map at whole-genome scale led to the discovery of a "proximity bias" globally confounding CRISPR-Cas9-based functional genomics screens. Finally, I will discuss how publicly-shared resources from Recursion, including the RxRx3 dataset and MolRec application, enable downstream research both on cellular images themselves and on deep learning-derived embeddings, making advanced image analysis more accessible to researchers worldwide.

Fri, 05 May 2023

14:00 - 15:00
Virtual

Data-driven protein design and molecular latent space simulators

Professor Andrew Ferguson
(Pritzker School of Molecular Engineering University of Chicago)
Abstract

Data-driven modeling and deep learning present powerful tools that are opening up new paradigms and opportunities in the understanding, discovery, and design of soft and biological materials. I will describe our recent applications of deep representational learning to expose the sequence-function relationship within homologous protein families and to use these principles for the data-driven design and experimental testing of synthetic proteins with elevated function. I will then describe an approach based on latent space simulators to learn ultra-fast surrogate models of protein folding and biomolecular assembly by stacking three specialized deep learning networks to (i) encode a molecular system into a slow latent space, (ii) propagate dynamics in this latent space, and (iii) generatively decode a synthetic molecular trajectory.

Tue, 07 Mar 2023

15:30 - 16:30
Virtual

Correlated stochastic block models: graph matching and community recovery

Miklos Racz
(Northwestern University)
Further Information

Part of the Oxford Discrete Maths and Probability Seminar, held via Zoom. Please see the seminar website for details.

Abstract

I will discuss statistical inference problems on edge-correlated stochastic block models. We determine the information-theoretic threshold for exact recovery of the latent vertex correspondence between two correlated block models, a task known as graph matching. As an application, we show how one can exactly recover the latent communities using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph. Furthermore, we obtain the precise threshold for exact community recovery using multiple correlated graphs, which captures the interplay between the community recovery and graph matching tasks. This is based on joint work with Julia Gaudio and Anirudh Sridhar.

Subscribe to Virtual