Tue, 09 Jun 2026

13:00 - 14:00
Lecture Room 6

Understanding and Improving LLM Training via Hessian and Spectral Analysis

Professor Ruoyu Sun
(The Chinese University of Hong Kong, Shenzhen)
Abstract

Professor Ruoyu Sun will talk about: 'Understanding and Improving LLM Training via Hessian and Spectral Analysis' 


In the first part, we investigate the approximate block-diagonal Hessian structure of neural networks. We identify the conditions under which this structure emerges and give the first rigorous proofs based on random matrix theory. From this structural perspective, we explain why Adam works far better than SGD on Transformers. Following this structural guideline, we design the memory-efficient optimizer Adam-mini; Normuon is another optimizer developed under the same principle.

 In the second part, we adopt a spectral perspective to study and refine normalization layers for neural network training. We propose a preconditioning (PC) layer, an advanced weight-centric module built with low-degree polynomial preconditioning for scalable spectral control. Theoretically, for deep linear networks, we prove that bounding each layer's singular values ensures geometric convergence of gradient descent to global minima. Empirically, PC delivers consistent efficiency gains over a standard Transformer baseline in Llama2-1B pretraining.

Safety and Efficacy in the Transcortical and Transsylvian Approach in Insular High-Grade Gliomas: A Comparative Series of 58 Patients.
Morello, A Rizzo, F Gatto, A Panico, F Bianconi, A Chiari, G Armocida, D Greco Crasto, S Melcarne, A Zenga, F Rudà, R Morana, G Garbossa, D Cofano, F Current oncology (Toronto, Ont.) volume 32 issue 2 98 (10 Feb 2025)
Accuracy and Safety Between Robot-Assisted and Conventional Freehand Fluoroscope-Assisted Placement of Pedicle Screws in Thoracolumbar Spine: Meta-Analysis.
Morello, A Colonna, S Lo Bue, E Chiari, G Mai, G Pesaresi, A Garbossa, D Cofano, F Medicina (Kaunas, Lithuania) volume 61 issue 4 690 (09 Apr 2025)
Correction to: Randomized Strong Recursive Skeletonization: Simultaneous Compression and LU Factorization of Hierarchical Matrices using Matrix–Vector Products
Yesypenko, A Martinsson, P Journal of Scientific Computing volume 108 issue 1 (01 Jul 2026)
Gresham's Law for Conference Submissions: Adopting a Tiered Contribution Taxonomy for the Agentic Research Era
Saqur, R Klose, T (01 Jun 2026)
Universal Time Series Generation with Neural Controlled Differential Equations
Saqur, R Berndt, T Farjallah, E Walker, B Stuhmer, J Seute, L (27 May 2026)
Subscribe to