Tue, 20 Feb 2024

14:30 - 15:00
L6

CMA Light: A novel Minibatch Algorithm for large-scale non convex finite sum optimization

Corrado Coppola
(Sapienza University of Rome)
Abstract
The supervised training of a deep neural network on a given dataset consists of the unconstrained minimization of the finite sum of continuously differentiable functions, commonly referred to as loss with respect to the samples. These functions depend on the network parameters and most of the times are non-convex.  We develop CMA Light, a new globally convergent mini-batch gradient method to tackle this problem. We consider the recently introduced Controlled Minibatch Algorithm (CMA) framework and we overcome its main bottleneck, removing the need for at least one evaluation of the whole objective function per iteration. We prove global convergence of CMA Light under mild assumptions and we discuss extensive computational results on the same experimental test bed used for CMA, showing that CMA Light requires less computational effort than most of the state-of-the-art optimizers. Eventually, we present early results on a large-scale Image Classification task.
 
The reference pre-print is already on arXiv at https://arxiv.org/abs/2307.15775
Tue, 20 Feb 2024

14:00 - 14:30
L6

Tensor Methods for Nonconvex Optimization using Cubic-quartic regularization models

Wenqi Zhu
(Mathematical Institute (University of Oxford))
Abstract

High-order tensor methods for solving both convex and nonconvex optimization problems have recently generated significant research interest, due in part to the natural way in which higher derivatives can be incorporated into adaptive regularization frameworks, leading to algorithms with optimal global rates of convergence and local rates that are faster than Newton's method. On each iteration, to find the next solution approximation, these methods require the unconstrained local minimization of a (potentially nonconvex) multivariate polynomial of degree higher than two, constructed using third-order (or higher) derivative information, and regularized by an appropriate power of the change in the iterates. Developing efficient techniques for the solution of such subproblems is currently, an ongoing topic of research,  and this talk addresses this question for the case of the third-order tensor subproblem.


In particular, we propose the CQR algorithmic framework, for minimizing a nonconvex Cubic multivariate polynomial with  Quartic Regularisation, by sequentially minimizing a sequence of local quadratic models that also incorporate both simple cubic and quartic terms. The role of the cubic term is to crudely approximate local tensor information, while the quartic one provides model regularization and controls progress. We provide necessary and sufficient optimality conditions that fully characterise the global minimizers of these cubic-quartic models. We then turn these conditions into secular equations that can be solved using nonlinear eigenvalue techniques. We show, using our optimality characterisations, that a CQR algorithmic variant has the optimal-order evaluation complexity of $O(\epsilon^{-3/2})$ when applied to minimizing our quartically-regularised cubic subproblem, which can be further improved in special cases.  We propose practical CQR variants that judiciously use local tensor information to construct the local cubic-quartic models. We test these variants numerically and observe them to be competitive with ARC and other subproblem solvers on typical instances and even superior on ill-conditioned subproblems with special structure.

Tue, 06 Feb 2024

14:30 - 15:00
L6

Computing $H^2$-conforming finite element approximations without having to implement $C^1$-elements

Charlie Parker
(Mathematical Institute (University of Oxford))
Abstract

Fourth-order elliptic problems arise in a variety of applications from thin plates to phase separation to liquid crystals. A conforming Galerkin discretization requires a finite dimensional subspace of $H^2$, which in turn means that conforming finite element subspaces are $C^1$-continuous. In contrast to standard $H^1$-conforming $C^0$ elements, $C^1$ elements, particularly those of high order, are less understood from a theoretical perspective and are not implemented in many existing finite element codes. In this talk, we address the implementation of the elements. In particular, we present algorithms that compute $C^1$ finite element approximations to fourth-order elliptic problems and which only require elements with at most $C^0$-continuity. We also discuss solvers for the resulting subproblems and illustrate the method on a number of representative test problems.

Tue, 06 Feb 2024

14:00 - 14:30
L6

Fast High-Order Finite Element Solvers on Simplices

Pablo Brubeck Martinez
(Mathematical Institute (University of Oxford))
Abstract

We present new high-order finite elements discretizing the $L^2$ de Rham complex on triangular and tetrahedral meshes. The finite elements discretize the same spaces as usual, but with different basis functions. They allow for fast linear solvers based on static condensation and space decomposition methods.

The new elements build upon the definition of degrees of freedom given by (Demkowicz et al., De Rham diagram for $hp$ finite element spaces. Comput.~Math.~Appl., 39(7-8):29--38, 2000.), and consist of integral moments on a symmetric reference simplex with respect to a numerically computed polynomial basis that is orthogonal in both the $L^2$- and $H(\mathrm{d})$-inner products ($\mathrm{d} \in \{\mathrm{grad}, \mathrm{curl}, \mathrm{div}\}$).

On the reference symmetric simplex, the resulting stiffness matrix has diagonal interior block, and does not couple together the interior and interface degrees of freedom. Thus, on the reference simplex, the Schur complement resulting from elimination of interior degrees of freedom is simply the interface block itself.

This sparsity is not preserved on arbitrary cells mapped from the reference cell. Nevertheless, the interior-interface coupling is weak because it is only induced by the geometric transformation. We devise a preconditioning strategy by neglecting the interior-interface coupling. We precondition the interface Schur complement with the interface block, and simply apply point-Jacobi to precondition the interior block.

The combination of this approach with a space decomposition method on small subdomains constructed around vertices, edges, and faces allows us to efficiently solve the canonical Riesz maps in $H^1$, $H(\mathrm{curl})$, and $H(\mathrm{div})$, at very high order. We empirically demonstrate iteration counts that are robust with respect to the polynomial degree.

Fri, 08 Mar 2024
16:00
L1

Maths meets Stats

James Taylor (Mathematical Institute) and Anthony Webster (Department of Statistics)
Abstract

Speaker: James Taylor
Title: D-Modules and p-adic Representations

Abstract: The representation theory of finite groups is a beautiful and well-understood subject. However, when one considers more complicated groups things become more interesting, and to classify their representations is often a much harder problem. In this talk, I will introduce the classical theory, the particular groups I am interested in, and explain how one might hope to understand their representations through the use of D-modules - the algebraic incarnation of differential equations.

 

Speaker: Anthony Webster
Title: An Introduction to Epidemiology and Causal Inference

Abstract: This talk will introduce epidemiology and causal inference from the perspective of a statistician and former theoretical physicist. Despite their studies being underpinned by deep and often complex mathematics, epidemiologists are generally more concerned by seemingly mundane information about the relationships between potential risk factors and disease. Because of this, I will argue that a good epidemiologist with minimal statistical knowledge, will often do better than a highly trained statistician. I will also argue that causal assumptions are a necessary part of epidemiology, should be made more explicitly, and allow a much wider range of causal inferences to be explored. In the process, I will introduce ideas from epidemiology and causal inference such as Mendelian Randomisation and the "do calculus", methodological approaches that will increasingly underpin data-driven population research.  

Fri, 26 Jan 2024
16:00
L1

North meets South

Dr Cedric Pilatte (North Wing) and Dr Boris Shustin (South Wing)
Abstract

Speaker: Cedric Pilatte 
Title: Convolution of integer sets: a galaxy of (mostly) open problems

Abstract: Let S be a set of integers. Define f_k(n) to be the number of representations of n as the sum of k elements from S. Behind this simple definition lie fascinating conjectures that are very easy to state but seem unattackable. For example, a famous conjecture of Erdős and Turán predicts that if f_2 is bounded then it has infinitely many zeroes. This talk is designed as an accessible overview of these questions. 
 
Speaker: Boris Shustin

Title: Manifold-Free Riemannian Optimization

Abstract: Optimization problems constrained to a smooth manifold can be solved via the framework of Riemannian optimization. To that end, a geometrical description of the constraining manifold, e.g., tangent spaces, retractions, and cost function gradients, is required. In this talk, we present a novel approach that allows performing approximate Riemannian optimization based on a manifold learning technique, in cases where only a noiseless sample set of the cost function and the manifold’s intrinsic dimension are available.

New College invites applications for this post, which is tenable for a fixed period of three years from 1 October 2024. The person appointed will be expected to undertake their own independent and original academic research in Mathematics. The Fellowship is open to those who have already acquired a first degree, and who at the time of appointment have completed at least two years’ study for a PhD/DPhil.

Tue, 07 May 2024

14:00 - 14:30
L3

The Approximation of Singular Functions by Series of Non-integer Powers

Mohan Zhao
(University of Toronto)
Abstract
In this talk, we describe an algorithm for approximating functions of the form $f(x) = \langle \sigma(\mu),x^\mu \rangle$ over the interval $[0,1]$, where $\sigma(\mu)$ is some distribution supported on $[a,b]$, with $0<a<b<\infty$. Given a desired accuracy and the values of $a$ and $b$, our method determines a priori a collection of non-integer powers, so that functions of this form are approximated by expansions in these powers, and a set of collocation points, such that the expansion coefficients can be found by collocating a given function at these points. Our method has a small uniform approximation error which is proportional to the desired accuracy multiplied by some small constants, and the number of singular powers and collocation points grows logarithmically with the desired accuracy. This method has applications to the solution of partial differential equations on domains with corners.
Subscribe to