We invite applications from talented postdoctoral researchers for a Hooke/Titchmarsh Research Fellowship in Complex Systems. This is a fixed-term position for 3 years at the University of Oxford. The successful candidate must have a PhD (or be close to completion) in mathematics or physics and a record of outstanding research in the mathematical theory of Complex Systems, including Random Matrix Theory and its Applications, and Statistical Mechanics, interpreted broadly.

The role of phenotypic heterogeneity in collective cell migration
Crossley, R
Fractonic solids
Jain, A Physical Review D volume 113 issue 10 (21 May 2026)
Statistics and asymptotics of subdivergence-free Feynman integrals in ϕ4 theory
Balduf, P Shaban, K Thürigen, J Proceedings of Science volume 485 (11 Mar 2026)
Vicky Neale, 1984–2023
Moulton, D Kirwan, F Bulletin of the London Mathematical Society volume 58 issue 5 (11 May 2026)
Tue, 09 Jun 2026

13:00 - 14:00
Lecture Room 6

Understanding and Improving LLM Training via Hessian and Spectral Analysis

Professor Ruoyu Sun
(The Chinese University of Hong Kong, Shenzhen)
Abstract

Professor Ruoyu Sun will talk about: 'Understanding and Improving LLM Training via Hessian and Spectral Analysis' 


In the first part, we investigate the approximate block-diagonal Hessian structure of neural networks. We identify the conditions under which this structure emerges and give the first rigorous proofs based on random matrix theory. From this structural perspective, we explain why Adam works far better than SGD on Transformers. Following this structural guideline, we design the memory-efficient optimizer Adam-mini; Normuon is another optimizer developed under the same principle.

 In the second part, we adopt a spectral perspective to study and refine normalization layers for neural network training. We propose a preconditioning (PC) layer, an advanced weight-centric module built with low-degree polynomial preconditioning for scalable spectral control. Theoretically, for deep linear networks, we prove that bounding each layer's singular values ensures geometric convergence of gradient descent to global minima. Empirically, PC delivers consistent efficiency gains over a standard Transformer baseline in Llama2-1B pretraining.

Safety and Efficacy in the Transcortical and Transsylvian Approach in Insular High-Grade Gliomas: A Comparative Series of 58 Patients.
Morello, A Rizzo, F Gatto, A Panico, F Bianconi, A Chiari, G Armocida, D Greco Crasto, S Melcarne, A Zenga, F Rudà, R Morana, G Garbossa, D Cofano, F Current oncology (Toronto, Ont.) volume 32 issue 2 98 (10 Feb 2025)
Accuracy and Safety Between Robot-Assisted and Conventional Freehand Fluoroscope-Assisted Placement of Pedicle Screws in Thoracolumbar Spine: Meta-Analysis.
Morello, A Colonna, S Lo Bue, E Chiari, G Mai, G Pesaresi, A Garbossa, D Cofano, F Medicina (Kaunas, Lithuania) volume 61 issue 4 690 (09 Apr 2025)
Subscribe to