The Mathematical Institute, University of Oxford, proposes to appoint a Departmental Lecturer in Mathematical and Computational Finance, from 1st September 2026 or as soon as possible thereafter. The appointment will be for a fixed period of 3 years.
As Departmental Lecturer, you will engage in advanced study and academic research in applied mathematics with a focus on mathematical and computational finance, as part of the Mathematical and Computational Finance Research Group led by Professor Rama Cont.
16:00
We invite applications from talented postdoctoral researchers for a Hooke/Titchmarsh Research Fellowship in Complex Systems. This is a fixed-term position for 3 years at the University of Oxford. The successful candidate must have a PhD (or be close to completion) in mathematics or physics and a record of outstanding research in the mathematical theory of Complex Systems, including Random Matrix Theory and its Applications, and Statistical Mechanics, interpreted broadly.
Understanding and Improving LLM Training via Hessian and Spectral Analysis
Abstract
Professor Ruoyu Sun will talk about: 'Understanding and Improving LLM Training via Hessian and Spectral Analysis'
In the first part, we investigate the approximate block-diagonal Hessian structure of neural networks. We identify the conditions under which this structure emerges and give the first rigorous proofs based on random matrix theory. From this structural perspective, we explain why Adam works far better than SGD on Transformers. Following this structural guideline, we design the memory-efficient optimizer Adam-mini; Normuon is another optimizer developed under the same principle.
In the second part, we adopt a spectral perspective to study and refine normalization layers for neural network training. We propose a preconditioning (PC) layer, an advanced weight-centric module built with low-degree polynomial preconditioning for scalable spectral control. Theoretically, for deep linear networks, we prove that bounding each layer's singular values ensures geometric convergence of gradient descent to global minima. Empirically, PC delivers consistent efficiency gains over a standard Transformer baseline in Llama2-1B pretraining.