Adjoint-Optimized Neural PDEs and the Regularized Newton Method in the Overparameterized Limit

Seminar series

Machine Learning and Data Science Seminar

Date

Mon, 18 May 2026

Time

14:00 - 15:00

Location

Lecture Room 3

Speaker

Dr Konstantin Riedl

Organisation

(Mathematical Institute University of Oxford)

Part 1: In the first part of this talk, we develop a convergence analysis for training neural PDEs in the overparameterized limit. Many engineering and scientific fields have recently become interested in modelling terms in PDEs with neural networks (NNs), which requires solving the inverse problem of learning NN terms from observed data in order to approximate missing or unresolved physics in the PDE model. The resulting neural PDE model, being a function of the NN parameters, can be calibrated to the available ground truth data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. We study the convergence of the adjoint gradient descent optimization method for training neural PDE models in the limit where both the number of hidden units and the training time tend to infinity, proving convergence of the trained neural PDE solution to the target data.

Part 2: For the second part, we turn towards developing a convergence analysis of the regularized Newton method for training NNs in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a „Newton neural tangent kernel“ (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data. We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows.