DataSig Summer Project Funding Available

DataSig are looking for students to undertake summer research projects as part of a collaboration with GCHQ. Funding is available for 5 projects of around 4-8 weeks long. To apply please email Viktoriia Davletshyna by the 18th June.

Project 1: Measuring when learned classifiers stretch reality.

How much does a trained model stretch a data-defined function? Lipschitz norms, robustness, and adversarial images.

Suppose we know a function only through data: for example, images x_i and labels or scores y_i. If we train a function f to extend this data to unseen inputs, does it extend the geometry of the data in a trustworthy way? In particular, does it create regions where tiny changes in input cause large changes in output? The project studies this through empirical Lipschitz constants and adversarial image perturbations.

Project 2: Anomalies at scale with signatures, nearest neighbours, and reproducible workflows.

From signature anomaly scores to a production-style detector for millions of sensor windows.

Can the nearest-neighbour/signature approach to anomaly detection be turned from a research notebook into a small professional system that ingests many streams, computes robust path features, builds a nearest-neighbour index, and produces ranked “worrisome behaviour” reports?

Project 3: Messages in the Roughness: Signature Kernels and Rough Path Geometry

Many data streams contain weak but highly structured signals hidden within overwhelming noise. In this project we will study a controlled “needle in a haystack” problem in which a finite collection of messages is encoded as smooth rough paths and inserted into Brownian motion (including its Lévy area). The task is to determine which message was embedded in a given observation. While the signal may be invisible to methods based only on first-order information, it can remain detectable through the higher-order geometric information captured by path signatures. Of particular interest are messages encoded through area and other genuinely rough-path effects, providing a mathematically natural setting in which higher-order features become essential. The project will combine mathematical modelling with computational experimentation, using Python and JAX to simulate data, implement kernel methods, and investigate the performance of different approaches in practice.

The project will focus on signature kernels and their partial differential equation formulation. Signature kernels can be computed via a hyperbolic Goursat PDE, while recent work of Lemercier et al. extends this framework by incorporating higher-order log-signature information directly into the PDE system, allowing rough-path features such as area to be incorporated without explicitly resolving fine-scale oscillations. This leads to a range of mathematical and computational questions: which levels of the signature are genuinely informative, how much discrimination is gained by incorporating higher-order log-signatures, and under what conditions can a hidden message be recovered reliably from noise? Further directions include efficient numerical methods for the associated PDEs, the statistical limits of message recovery, and connections with attention-based constructions arising from hyperbolic developments and linear controlled differential equations.

Bulletin category

For Students

Weight