Thursday, 24 June 2021

The Development of Mathematical Biology - a report from the Society for Mathematical Biology Conference

Oxford mathematical biologists, past and present, featured very prominently at the annual Society for Mathematical Biology (SMB) meeting held from 13-17 June and organised remotely from the University of California, Riverside. It was the largest ever conference in the field, with over 2,500 participants and more than 1,000 talks delivered by speakers from 47 countries. The conference had an unusual format, running 24 hours a day to cater for the worldwide audience. 

The breadth of topics covered showed how the subject has developed and changed over the past 10 years. Mathematical models are now being increasingly used to inform biological understanding (here the term biology is used very broadly and encompasses medicine, ecology, and epidemiology) at a level of detail not seen before.

This leads to new modelling challenges, for example in developing ways to integrate the microscale detail of biological systems to derive mathematically tractable macroscale models. This, in turn, leads to new mathematical equations that require advances in analysis techniques. In addition, there was increasing emphasis on the use of data to validate models and this brings with it novel challenges in spatial statistics, parameter estimation and identifiability. A key theme of the conference was education and equality, diversity and inclusivity.

The conference began with Kit Yates, formerly of Oxford Mathematics' Wolfson Centre for Mathematical Biology (WCMB) and now a Senior Lecturer at the University of Bath, delivering a talk featuring the work that won him the Society’s Lee Segel Prize for the best paper published in the Bulletin of Mathematical Biology. It ended with four of the prizes for best talk delivered in various sessions going to members of the WCMB. 

Johannes Borgqvist was awarded a Postdoc Prize for his presentation “Symmetry methods for model-construction and analysis” and Aden Forrow won the Data-Driven Prize for his talk “Learning stochastic dynamics with measurement noise.” Duncan Martinson was awarded a Contributed Talk Prize for his presentation “Extracellular matrix remodelling by neural crest cells provides a robust signal for collective migration”, and Solveig van der Vegt won a student prize for her talk “Mathematical modelling of autoimmune myocarditis and the effects of immune checkpoint inhibitors”. In addition, Mohit Dalwadi from the Oxford Centre for Industrial and Applied Mathematics (OCIAM) won a Contributed Talk Prize for his presentation “Emergent robustness of bacterial quorum sensing in fluid flow”, and former WCMB students Linus Schumacher (now at Edinburgh) and Jody Reimer (now at Utah) also won prizes.

Thursday, 17 June 2021

What takes a mathematician to the Arctic?

What takes a mathematician to the Arctic? In short, context. The ice of the Arctic Ocean has been a rich source of mathematical problems since the late 19$^{th}$ century, when Josef Stefan, aided by data from expeditions that went in search of the Northwest Passage, developed the classical Stefan problem. This describes the evolution of a moving boundary at which a material undergoes a phase change. In recent years, interest in the Arctic has only increased, due to the rapid changes occurring there due to climate change. The Arctic is particularly sensitive to changes in global temperatures, with consequences for the rest of the northern hemisphere that can include extreme weather events such as the 'Beast from the East' event in 2018.

In 2019, I, Oxford Mathematician Michael Coughlan, undertook fieldwork as part of the SODA project in the central Arctic aboard the USCGC Healy (with Martin Doble of Polar Scientific Ltd.), deploying buoys onto the ice between $70^{\circ}$ and $80^{\circ}$N, that would drift with the currents and measure conditions above, below and within the ice. While there, I also took measurements of the ice floes on which we were working, and made observations of the ice and the ocean to inform my modelling work.


Figure 1: Melt ponds in Arctic Ocean, 2014. Pictured is the icebreaker RV Araon, which (for scale) is $110$m long. Photo: M. Doble. Polar Scientific Ltd. (b) Buoy deployment, Beaufort Sea, September 2019.

My work - with Oxford Mathematicians Ian Hewitt, Sam Howison, and Oxford Physicist Andrew Wells - focusses on modelling the formation and evolution of melt ponds on sea ice - features that form when water from the melting surface of the ice settles in the hollows and troughs of the topography. These can contribute further to melting of the floes, as water absorbs much more light than the surrounding ice, a phenomenon known as the ice-albedo feedback. The presence of surface water thereby hastens melting of the ice below it, often leading to the fracturing and break-up of floes (Arntsen et al., 2015). Due to the feedback effect, care must be taken to incorporate the behaviour of ponds in climate models, and my research investigates how to do this.


Figure 2: Time series of modelled area fraction of ponds (pond area as a fraction of total floe area) compared to data from two field experiments by Polashenski et al (2012). Time is measured from the onset of ponding. Left: 'North' site. Right: 'South' site.

Ponds display a range of geometries, from simple, single ponds, with areas of the order of several square metres, to large clusters, made up of what were previously many individual ponds, which can span areas of many hundreds of square metres (Hohenegger et al., 2012). These geometries are due in part to the distinct life cycle of ponds, which form initially where water from bare ice flows to the lowest parts of individual catchments in the topography. They then grow, begin to join one another and can eventually flood the whole floe. Interestingly, however, once the ice warms enough, it becomes permeable and the ponds drain into the sea. From then on, ponds only remain where the ice surface lies below sea level. Ponds also melt and sculpt the ice underneath them. Therefore, the history of their growth has an effect on their geometry and extent once the ponds drain to sea level.

We model the growth and evolution of ponds on a network in which the nodes are individual catchments, and links exist between each pair of nodes that neighbour one another. We express the behaviour of the ponds as a dynamical system on a network.

To do so we assign a dynamical variable which we call the activity, $x_i(t)$, to each node. A dynamical system can be written on the network as \begin{equation}\label{eq:activity} \dot{x_i} = f_i(x_i,t) + A_{ij}g_{ij}(x_i,x_j,t), \end{equation} where $f$ is a function of the node attributes and activity and $g$ represents how neighbouring nodes affect each other (Porter and Gleeson, 2016). The adjacency matrix $A_{ij}$ encodes the network structure. Here, the activity is the pond water level in each catchment. The function $f$ represents the contribution of melt-water both inflowing from the bare ice in the catchment, and the ice melting at the pond floor, derived from parametrised melt rates for bare and pond-covered ice, and conservation of mass. The function $g$ represents the fluxes between neighbouring catchments, which can take the form of one pond overflowing into another or two ponds joining together. Drainage is incorporated with the addition of a special 'ocean' node connected to each catchment, and a possible flux through the ice along each of the corresponding links.


Figure 3: Visualisation of pond geometry and network structure for an ice floe. Dotted lines denote inactive edges, solid blue denote joins, and dashed yellow lines denote overflows. Greyed areas denote ponded regions. Red lines denote the boundaries between catchments. Blue circles denote draining catchments.

Time series of pond area predicted by the model are shown in Figure 2, compared to field data (Polashenski et al, 2012). The model shows good qualitative agreement with data, and resolves the important features of the life cycle of ponds. The model can be used to estimate the duration of the different processes in the life cycle, and affords us a computationally cheap way to parametrize pond dynamics. Further, by varying the parameters, we can explore how pond extent is likely to change as the Arctic warms.


A. E. Arntsen, A. J. Song, D. K. Perovich, and J. A. Richter-Menge. Observations of the summer breakup of an Arctic sea ice cover. Geophys. Res. Lett., 42(19): 8057–8063, 2015. ISSN 19448007. doi: 10.1002/2015GL065224.

C. Hohenegger, B. Alali, K. R. Steffen, D. K. Perovich, and K. M. Golden. Transition in the fractal geometry of Arctic melt ponds. Cryosph., 6(5):1157–1162, 2012. ISSN 19940416. doi: 10.5194/tc-6-1157-2012.

C. Polashenski, D. Perovich, and Z. Courville. The mechanisms of sea ice melt pond formation and evolution. J. Geophys. Res. Ocean., 117(1):1–23, 2012. ISSN 21699291. doi: 10.1029/2011JC007231.

M. A. Porter and J. P. Gleeson. Dynamical systems on networks: a tutorial. Frontiers in applied dynamical systems. Springer, Cham, 2016. ISBN 9783319266411.

Wednesday, 9 June 2021

Alison Etheridge appointed Chair of the Council for the Mathematical Sciences

Congratulations to Professor Alison Etheridge FRS who has been appointed as the new Chair of the Council for the Mathematical Sciences which represents the whole breadth of the mathematical sciences in the UK, with input from the Institute of Mathematics and its Applications (IMA), the London Mathematical Society (LMS), the Royal Statistical Society (RSS), the Edinburgh Mathematical Society (EMS) and the Operational Research Society (ORS).

Mike Giles, Head of the Mathematical Institute in Oxford, said: "As Alison has a joint appointment in the Mathematical Institute and the Department of Statistics in Oxford (of which she is Head), as well as being chair of the Research Excellence Framework (REF) Mathematical Sciences panel, and serving on the Engineering and Physical Sciences Research Council (EPSRC), she is ideally qualified for this broad and important role."

Alison is Professor of Probability in Oxford, having worked at the Universities of Cambridge, Berkeley, Edinburgh and Queen Mary University London before returning to Oxford. Her interests have ranged from abstract mathematical problems to concrete applications with her recent work focused on mathematical modelling of population genetics.

Read more on the CMS website

Tuesday, 8 June 2021

Modelling changes in infectiousness in COVID-19

Oxford Mathematician William Hart and former Oxford Mathematician Dr Robin Thompson (now an Assistant Professor at the University of Warwick) discuss their latest joint COVID-19 research (carried out with fellow Oxford Mathematician Philip Maini), using mathematical models to infer changes in infectiousness during SARS-CoV-2 infections.

"When a person is infected by SARS-CoV-2 [the virus that causes COVID-19], the risk that they go on to infect someone else varies during their course of infection. Changes in the amount of virus within the infected person affect this transmission risk, as do behavioural factors such as the number of contacts that they have with others.

Understanding the effectiveness of public health measures introduced to combat the COVID-19 pandemic requires changes in infectiousness during an infection to be assessed. For example, isolation strategies are only likely to be effective if infected individuals are isolated over the period when they are most infectious. Consequently, there is widespread interest in estimating how infectiousness varies during infection.

Most previous studies estimating changes in infectiousness during a SARS-CoV-2 infection have made an unrealistic assumption: the risk of an individual transmitting the virus does not depend on when exactly during infection they develop symptoms (i.e. whether they are pre-symptomatic or symptomatic at a given time since infection). In fact, infected individuals might in reality be less likely to infect others once they develop symptoms, because they are then more likely to stay at home.

We developed a new framework for estimating how an infected person’s infectiousness changes during a SARS-CoV-2 infection, explicitly linking the transmission risk to when they develop symptoms. Our method provides an improved fit to data from SARS-CoV-2 infected individuals compared to existing approaches. Our model predicts that a high proportion of transmissions (around 65%) occur before symptoms develop. Further, the transmission risk is highest immediately before symptoms. This highlights the importance of identifying people who have come into contact with known infected individuals. If these contacts can be found and isolated before they develop symptoms, then transmission of the virus can be reduced.

Our major finding – that many transmissions occur shortly before symptoms – is of interest to public health policy makers. The new methodology, and our estimates of changes in the transmission risk during infection, are also useful for epidemic modellers working on COVID-19 and other diseases. Our research will help to make other models more accurate, including models used for projecting future numbers of cases or deaths and models used to estimate changes in the R number.

We used data from the early months of the pandemic in our study. Since then, we have been exploring how characteristics of transmission have changed as the pandemic has progressed. While our original article used data from different countries, we recently released a new preprint exploring how transmission has changed in the UK during the pandemic, using household data collected by Public Health England. Models used to inform policy should include the most accurate possible estimates of infectiousness, to ensure that implemented interventions are based upon the best available evidence. This includes using up-to-date estimates, ideally from the specific location under consideration."


Figure caption. A. The expected infectiousness of an infected individual at each time since infection, predicted either using a commonly used approach in which infectiousness is assumed to be independent of when the individual develops symptoms (blue), or using our mechanistic model which explicitly links infectiousness to symptoms (red). B. Equivalent to panel A but instead showing infectiousness relative to the time at which the individual develops symptoms. The drop in infectiousness is due to changes in behaviour when infected individuals develop symptoms. C. Comparison of the fit provided by the two models to data describing serial intervals (the time periods between infectees and their infectors developing symptoms).


Sunday, 6 June 2021

Arbitrage-free neural-SDE market models

Oxford Mathematicians Samuel N. Cohen, Christoph Reisinger and Sheng Wang have developed new methods to help machine learning build economically reasonable models for options markets. By embedding no-arbitrage restrictions within a neural network, more trustworthy and realistic models can be built, allowing for better risk management in the banking system.

"A European call option is a financial contract that gives the buyer the right, but not the obligation, to acquire an underlying asset (e.g. Facebook shares, crude oil) at a specified strike price, denoted as $K$, at a future expiry date, denoted as $T$. As a call option buyer will be guaranteed a non-negative payoff at the option's expiry, the buyer pays a premium for purchasing the option, known as option price and denoted as $C(T,K)$. Banks, funds and many other financial institutions actively trade these options, for the purpose of hedging their risk exposures to the underlying asset or simply speculating on the market. Both the underlying asset price and the option prices change stochastically over time, exposing option traders to considerable market risks. To manage risks of traders' option positions, it is crucial to model joint dynamics of liquid call options and their underlying asset. However, restrictive model specifications expose the risk manager to model risk, the danger of false predictions and erroneous decision making due to inaccurate assumptions on the market behaviour.



Figure 1. The quoted strikes (horizontal axis) for all expiries (vertical axis) of CME weekly- and monthly-listed EURUSD European call options as of 31st May, 2018. Each dot represents a market quote.

In our recent work, we harness advances in neural network methodology and combine them with fundamental economic considerations of options markets to learn a realistic model for the dynamics of options books from observed price time series. The learned model can then be used for generating forward-looking scenarios in the estimation of risk measures of positions or the valuation of more exotic derivative contracts.

Specifically, we consider a financial market where the following assets are liquidly traded: a stock $S$ and a collection of European call options $C(T,K)$ on $S$ with various expiries $T \in \mathcal{T}$ and strikes $K \in \mathcal{K}$ (see Figure 1 as an example). These assets' prices are known to be related to each other in complex ways. For example, the volatility of the stock price which is consistent with the option price of a certain strike and maturity depends on these variables in a characteristic, qualitatively universal way. These stylised relationships suggest that there is significant statistical information captured in the interrelated prices of the stock and options. However, modelling these jointly is challenging. Furthermore, as we detail below, there are various constraints on prices which must hold in the absence of arbitrage, and these should be reflected in any statistical model.

Arbitrage refers to a costless trading strategy that has zero risk and a positive probability of profit. A static arbitrage is an arbitrage exploitable by fixed positions in options and the underlying stock at initial time. As an example, it must hold the condition that $C(T,K_1) \geq C(T,K_2)$ for $K_1 < K_2$, otherwise by buying one $(T, K_1)$ option and selling one $(T, K_2)$ option, we make immediate profit of $C(T, K_2) - C(T, K_1)$ with non-negative terminal payoff. Crucially, this effect is independent of any assumptions on the behaviour of the underlying asset. Another type of arbitrage is called dynamic arbitrage, where positions in options and the underlying stock can change over time; dynamic arbitrage relies on dynamics and path properties of the tradable assets. An economically meaningful model should admit neither type of arbitrage.

We construct a family of factor-based market models, where option prices $C$ are represented by a linear combination of factors $\xi$. Therefore, the factor representation in principle allows exact static calibration, and the joint dynamics of options are straightforwardly available once the dynamics of the factors are specified. The models are then given by a finite system of stochastic differential equations (SDEs) for the factors and the stock price. Importantly, we derive an HJM-type drift condition on the factor SDEs which guarantees freedom from dynamic arbitrage, and the state space of the market factor processes where the models are free from static arbitrage. Since static arbitrage constraints are linear inequalities of option prices, together with linear factor representation, the state space for factors is a convex polytope, defined as the shape formed by an intersection of half spaces. As an example, we take a time series of option and stock prices (simulated from a ground-truth Heston-type stochastic local volatility model) and decode two-dimensional factors from them, where the trajectory of the factors and its statically arbitrage-free state space (light green polygon) are displayed in Figure 2.



Figure 2: Trajectory (black dots) of the $\mathbb{R}^2$ factors and the corresponding static arbitrage constraints (red dashed lines) projected to the $\mathbb{R}^2$ factor space.

For our models, inference consists of two independent steps: factor decoding and SDE model calibration. The factor decoding step is to extract a smaller number of market factors from prices of finitely many options. These factors are built to reflect the joint goals of eliminating static and dynamic arbitrage in reconstructed prices and guaranteeing statistical accuracy. In the SDE model calibration step, we represent the drift and diffusion functions by neural networks, referred to as neural SDE. By leveraging deep learning algorithms, we train the neural networks by maximising the likelihood of observing the factor paths, subject to the derived arbitrage constraints. This allows for calibration and model selection to be done simultaneously.

No-arbitrage conditions are embedded as part of model inference. Specifically, static arbitrage constraints are characterised by a convex polytope state space for the latent factors; we identify sufficient conditions on the drift and diffusion to restrict the factors to their arbitrage-free domain. Consequently, the neural network that is used to parameterise the drift and diffusion functions needs to be constrained. We propose a novel hard constraint approach that modifies the network to respect sufficient conditions on the drift and diffusion to restrain the process within the polytope. The architecture of the modified neural network is plotted in Figure 3.


Figure 3. Constrained neural network. The operators $\mathcal{G}_\mu$ and $\mathcal{G}_\sigma$ transform drift and diffusion functions, respectively, such that the resulting process is restrained within the statically arbitrage-free polytope.

As a demonstration of how well the neural network can learn the factor dynamics as shown in Figure 2, we take the learnt model and simulate time series of the two-dimensional factors. In Figure 4, we compare the empirical distributions of the simulated $\xi_1$ and $\xi_2$ with those of the input data (simulated from a Heston-SLV model).


Figure 4. Comparison of the marginal and joint distributions of the simulated $\xi_1$ and $\xi_2$ with the real distributions (generated from the Heston-SLV model).

We see that the learnt model is capable of generating realistic long time series data that are similar to the input data. Through the linear factor representation, it is straightforward to compute call option prices, and indeed prices of other derivatives such as the CBOE VIX volatility index, from the simulated factors. In Figure 5, we plot the simulated time series of VIX and log-return of $S$. We see several occurrences of volatility clustering in the return series, which always coincide with high VIX values.



Figure 5. Simulated time series of the VIX volatility index, together with log-returns of the underlying stock.

In conclusion, we construct market models for the stock and options that permit no arbitrage, allow exact cross-sectional calibration, and reflect stylised facts observed from market price dynamics. Importantly, it is practically convenient to estimate these models, given that observations are discrete time series of prices for a large but finite collection of options. In particular, we exploit the recent successes in the use of neural networks as function approximators in order to give a flexible class of models."

This project was supported by CME Group through the Centre for Doctoral Training in Industrially Focused Mathematical Modelling.

[1] S. N. Cohen, C. Reisinger, and S. Wang. Arbitrage-free neural-SDE market models, 2021

Tuesday, 25 May 2021

A network model of labor market dynamics

Oxford Mathematicians and Economists Maria del Rio-Chanona, Penny MealyMariano Beguerisse-Díaz, François Lafond, and J. Doyne Farmer discuss their network model of labor market dynamics.

"Mathematics has explained many physical, chemical, and biological phenomena, but can it explain how the economy works? It is challenging because the economy is highly diverse, and ever-changing, with both short term fluctuations - it goes through recession and recovery periods - and long-term structural change - innovation transforms the scope and diversity of what we do.

Take the labor market, for example. Figure 1 shows what we call the occupational mobility network (1) - each node is an occupation, and the links show how likely it is that a worker in an occupation moves to another occupation. Clearly, there are many different occupations, and some occupational transitions are more likely than others. How can we model the dynamics of the labor market while taking this into account (click figure to enlarge)?


Figure 1Occupational mobility network. Nodes represent occupations, and links represent transitions of workers between occupations. The size of the nodes is proportional to the logarithm of the number of employees in each occupation. Nodes are coloured by their broad occupation classification.

We started from a simple model where firms fire (i.e. "separate'') and hire workers, and unemployed workers take up "accessible'' jobs, that is, job openings that are either in their current occupation, or one step away in the network. Our model is a computational, bounded-rational, non equilibrium model - agents follow simple rules, and we use a computer to simulate what happens. Computational models are very flexible, so they can be constructed around the best available data, without the need for over-simplifying the phenomenon being modelled. But they are computationally costly to run, and it is sometimes difficult to understand what precise mechanism is responsible for the behaviour of the model.

To solve these issues we also created a mathematically tractable model that, under some simplifying assumptions, represents well the more complex, computational model. For each occupation $i$, the key variables of interest (employment, unemployment and the number of opened vacancies) follow the laws of motion \begin{align*} \underbrace{\bar{e}_{i,t+1} - \bar{e}_{i,t}}_{\text{change in employment}} & = - \underbrace{ \Bigg( \delta_u \bar{e}_{i,t} + (1 - \delta_u) \gamma_u \max \big\{0, \bar{d}_{i,t} - d_{i,t}^\dagger \big\} \Bigg)}_{\text{separated workers}} + \underbrace{\sum_j \bar{f}_{ji, t+1,}}_{\text{hired workers}}\\ \underbrace{\bar{u}_{i,t+1}-\bar{u}_{i,t}}_{\text{change in unemployment}} & = \underbrace{ \Bigg( \delta_u \bar{e}_{i,t} + (1 - \delta_u) \gamma_u \max \big\{0, \bar{d}_{i,t} - d_{i,t}^\dagger \big\} \Bigg)}_{\text{separated workers}} - \underbrace{\sum_j \bar{f}_{ij, t+1,}}_{\text{transitioning workers}}\\ \underbrace{\bar{v}_{i,t+1} - \bar{v}_{i,t}}_{\text{change in vacancies}} & = \underbrace{ \Bigg( \delta_v \bar{e}_{i,t} + (1 - \delta_v) \gamma_v \max \big\{0, d_{i,t}^\dagger - \bar{d}_{i,t} \big\} \Bigg)}_{\text{opened vacancies}} - \underbrace{\sum_j \bar{f}_{ji, t+1.}}_{\text{hired workers}} \end{align*}

In these equations, the parameters $\delta_u$ and $\delta_v$ quantify how often workers are separated ($\delta_u$) and vacancies are opened ($\delta_v$) for completely random reasons, that is, independently of relative labor demand. The variable $d_{i,t}^\dagger$, which we take as given, represents the level of labor demand desired by employers, given the state of the economy. It is compared with the actual level, defined as the number of people employed plus the number of opened vacancies ($d_{i,t} = e_{i,t} + v_{i,t}$). The parameters ($\gamma_u$ and $\gamma_v$) then determines how much of the gap they close during one period, by separating workers if desired labor demand is less realised ($\gamma_u$), and by opening vacancies if desired labor demand is greater than realized ($\gamma_v$).

The movement of workers across occupations is captured by the last terms, where $f_{ij, t}$ is the number of workers previously employed in occupation $i$ that are hired in occupation $j$ at time $t$. These terms depend on the structure of the occupational mobility network, and on behavioural assumptions about how people look for a job and are matched to a vacancy.

The bar over the symbols for the variables ($e$,$u$,$v$,$d$,$f$) are a reminder that these are not the quantities from the full-fledged computational model, but only approximations (loosely speaking, averages over the ensemble of simulations, in the limit of a large number of agents).

To test whether the model produces reasonable dynamics, we have shown that it is able to reproduce the Beveridge curve, a well-known macroeconomic stylized fact: when more vacancies open, unemployment goes down (Figure 2). As a proof-of-concept, we have used the model to anticipate the effects of future automation. Using predictions of what occupations robots and algorithms are likely to replace in the future, we used the model to understand the movement of workers in and out of occupations. Our key finding is that unemployment in an occupation depends not only on its direct exposure to automation, but also on the indirect exposure, via occupations close to it. Neighboring occupations that are not hard-hit make it easy for workers to find jobs there, but neighboring occupations that do suffer a large shock create an inflow of job seekers, and thus extra competition. 



Figure 2: The Beveridge curve: the grey line shows the US empirical data between December 2007 and December 2018. The arrows correspond to the numerical solution of the system of deterministic equations, and the solid green lines correspond to ten simulations of the full stochastic model.

As the nature of work keeps evolving, policymakers will need quantitative tools to help them target employment assistance packages and skill development programs to those who need it most. Mathematical models can help with this."

[1] R. Maria del Rio-Chanona, Penny Mealy, Mariano Beguerisse-Díaz, François Lafond, and J. Doyne Farmer. Occupational mobility and automation: a data-driven network model. Journal of The Royal Society Interface, 18(174):20200898, 2021.

Wednesday, 19 May 2021

Oxford Mathematics Online Exhibition 2021

Born out of lockdown in 2020, the Oxford Mathematics Online Exhibition might just have become a permanent fixture in our mathematical lives.

We ask all our Oxford Mathematicians, young and less young, to come up with art that expresses a mathematical idea in the form of their choice.

Image to the right: Joel Madly - Triangular mesh with fractal behaviour (click to see full detail)

Image below: Andrew Krause - Turing Pattern Faces (click to see full detail)

Click here for the full exhibition


Thursday, 13 May 2021
Monday, 10 May 2021

Rational Neural Networks

Deep learning has become an important topic across many domains of science due to its recent success in image recognition, speech recognition, and drug discovery. Deep learning techniques are based on neural networks, which contain a certain number of layers to perform several mathematical transformations on the input. A nonlinear transformation of the input determines the output of each layer in the neural network: $x \mapsto \sigma(W x + b)$, where $W$ is a matrix called the weight matrix, $b$ is a bias vector, and $\sigma$ is a nonlinear function called the activation function.

Each of these variables contain several parameters, which are updated during the training procedure of the neural network to fit some data. In standard image classification problems, the input of the network consists of images, while the outputs are the associated labels. The computational cost of training a neural network depends on its total number of parameters. A key question in designing deep learning architectures is the choice of the activation function.


                                        Figure 1. A rational activation function (red) initialized close to the ReLU function (blue).

In a recent work, Oxford Mathematicians Nicolas Boullé and Yuji Nakatsukasa, together with Alex Townsend from Cornell University, introduced a novel type of neural networks, based on rational functions, called rational neural networks [1]. Rational neural networks consist of neural networks with rational activation functions of the form $\sigma(x)=P(x)/Q(x)$, where $P$ and $Q$ are two polynomials. One particularity is that the coefficients of the rational functions are initialized close to the standard ReLU activation function (see fig 1) and are also trainable parameters. These type of networks have been proven to have higher approximation power than state-of-the-art neural network architectures, which means that they can tackle a variety of deep learning problems with fewer number of trainable parameters.



     Figure 2. Two-dimensional function learned by a rational neural network (left) and loss function during training compared with standard architecture (right).

Rational neural networks are particularly suited for regression problems due to the smoothness and approximation power of rational functions (see fig 2). Moreover, they are easy to implement in existing deep learning architectures such as TensorFlow or PyTorch [2]. Finally, while neural networks have applications in diverse fields such as facial recognition, credit-card fraud, speech recognition, and medical diagnosis, there is a growing need for understanding their approximation power and other theoretical properties. Neural networks, in particular rational neural networks, have the potential to revolutionize fields where mathematical models derived by mechanistic principles are lacking [3].

1. Boullé, Nakatsukasa, Townsend, Rational neural networks, NeurIPS 33, 2020.

2. GitHub repository, 2020.

3. Boullé, Earls, Townsend, Data-driven discovery of physical laws with human-understandable deep learning, arxiv:2105.00266, 2021.

Monday, 10 May 2021

Mathematical Institute Athena Swan Silver Award renewed

The Athena Swan Charter is a framework which is used across the globe to support and transform gender equality within higher education (HE) and research. 

In 2013 the Mathematical Institute received an Athena Swan Bronze Award for its work in addressing the issue of gender equality in its subject and working environment and, as a result of our work over the next four years, in 2017 we received the Silver Award. This year that Silver has been renewed and reflects the work put in as we strive for gender equality in a subject that, while predominantly still male, is becoming more balanced.

What have we done? 

- We have Increased gender diversity across most student/staff groups, with female postgraduate numbers almost doubling. Given that these are the faculty of the future, In Oxford and elsewhere, this is very encouraging. We will continue to work on providing an environment that encourages those students to continue to progress in their careers.

- We are pleased with the ongoing success of our recent, prestigious Hooke and Titchmarsh Postdoctoral Research Fellowships, which attract a high number of women and provide an exceptional springboard into an independent academic career. 

- We continue to work hard to engage and encourage young school-age female mathematicians. It All Adds Up, where we bring female school pupils together to meet other keen young mathematicians, is just one successful example. We are approaching 30% female undergraduate intake, in line with the number of female high-school students studying Further Maths at 'A' Level.

Charters such as Athena Swan work best when they make you think about what you do rather than being an end in themselves. That has been perhaps the most successful part of Athena Swan for us. We are integrating it into our strategic priorities and intertwining it with, for example, our Race Equality action plan.. Mathematics is a subject that had a lot of ground make up, but we and the wider mathematical world are making meaningful progress.

Read more about our plans here.