14:00
Forthcoming events in this series
14:00
Solving (algebraic problems from) PDEs; a personal perspective
Abstract
We are now able to solve many partial differential equation problems that were well beyond reach when I started in academia. Some of this success is due to computer hardware but much is due to algorithmic advances.
I will give a personal perspective of the development of computational methodology in this area over my career thus far.
Unleashing the Power of Deeper Layers in LLMs
Abstract
Large Language Models (LLMs) have demonstrated impressive achievements. However, recent research has shown that their deeper layers often contribute minimally, with effectiveness diminishing as layer depth increases. This pattern presents significant opportunities for model compression.
In the first part of this seminar, we will explore how this phenomenon can be harnessed to improve the efficiency of LLM compression and parameter-efficient fine-tuning. Despite these opportunities, the underutilization of deeper layers leads to inefficiencies, wasting resources that could be better used to enhance model performance.
The second part of the talk will address the root cause of this ineffectiveness in deeper layers and propose a solution. We identify the issue as stemming from the prevalent use of Pre-Layer Normalization (Pre-LN) and introduce Mix-Layer Normalization (Mix-LN) with combined Pre-LN and Post-LN as a new approach to mitigate this training deficiency.
Tackling complexity in multiscale kinetic and mean-field equations
Abstract
Kinetic and mean-field equations are central to understanding complex systems across fields such as classical physics, engineering, and the socio-economic sciences. Efficiently solving these equations remains a significant challenge due to their high dimensionality and the need to preserve key structural properties of the models.
In this talk, we will focus on recent advancements in deterministic numerical methods, which provide an alternative to particle-based approaches (such as Monte Carlo or particle-in-cell methods) by avoiding stochastic fluctuations and offering higher accuracy. We will discuss strategies for designing these methods to reduce computational complexity while preserving fundamental physical properties and maintaining efficiency in stiff regimes.
Special attention will be given to the role of these methods in addressing multi-scale problems in rarefied gas dynamics and plasma physics. Time permitting, we will also touch on emerging techniques for uncertainty quantification in these systems.
Group discussion on the use of AI tools in research
Abstract
AI tools like ChatGPT, Microsoft Copilot, GitHub Copilot, Claude and even older AI-enabled tools like Grammarly and MS Word, are becoming an everyday part of our research environment. This last-minute opening up of a seminar slot due to the unfortunate illness of our intended speaker (who will hopefully re-schedule for next term) gives us an opportunity to discuss what this means for us as researchers; what are good helpful uses of AI, and are there uses of AI which we might view as inappropriate? Please come ready to participate with examples of things which you have done yourselves with AI tools.
Multilevel Monte Carlo methods
Abstract
In this seminar I will begin by giving an overview of some problems in stochastic simulation and uncertainty quantification. I will then outline the Multilevel Monte Carlo for situations in which accurate simulations are very costly, but it is possible to perform much cheaper, less accurate simulations. Inspired by the multigrid method, it is possible to use a combination of these to achieve the desired overall accuracy at a much lower cost.
Theory to Enable Practical Quantum Advantage
Abstract
Quantum computers are becoming a reality and current generations of machines are already well beyond the 50-qubit frontier. However, hardware imperfections still overwhelm these devices and it is generally believed the fault-tolerant, error-corrected systems will not be within reach in the near term: a single logical qubit needs to be encoded into potentially thousands of physical qubits which is prohibitive.
Due to limited resources, in the near term, hybrid quantum-classical protocols are the most promising candidates for achieving early quantum advantage and these need to resort to quantum error mitigation techniques. I will explain the basic concepts and introduce hybrid quantum-classical protocols are the most promising candidates for achieving early quantum advantage. These have the potential to solve real-world problems---including optimisation or ground-state search---but they suffer from a large number of circuit repetitions required to extract information from the quantum state. I will finally identify the most likely areas where quantum computers may deliver a true advantage in the near term.
Bálint Koczor
Associate Professor in Quantum Information Theory
Mathematical Institute, University of Oxford
Machine learning in solution of inverse problems: subjective perspective
Abstract
Following the 2012 breakthrough in deep learning for classification and visions problems, the last decade has seen tremendous raise of interest in machine learning in a wider mathematical research community from foundational research through field specific analysis to applications.
As data is at the core of any inverse problem, it was a natural direction for the field to investigate how machine learning could aid various aspects of inversion yielding numerous approaches from somewhat ad-hoc but very effective like learned unrolled methods to provably convergent learned regularisers with everything in between. In this talk I will review some on these developments through a lens of the research of our group.
On the loss of orthogonality in low-synchronization variants of reorthogonalized block classical Gram-Schmidt
Abstract
Numerical examples from the BlockStab toolbox are included throughout, to help compare variants and illustrate the effects of different choices of intraorthogonalization subroutines.
Finite element approximation of eigenvalue problems
Abstract
In this informal talk I will review some theoretical and practical aspects related to the finite element approximation of eigenvalue problems arising from PDEs.
The review will cover elliptic eigenvalue problems and eigenvalue problems in mixed form, with particular emphasis on the Maxwell eigenvalue problem.
Other topics can be discussed depending on the interests of the audience, including adaptive schemes, approximation of parametric problems, reduced order models.
A New Two-Dimensional Model-Based Subspace Method for Large-Scale Unconstrained Derivative-Free Optimization: 2D-MoSub
Abstract
This seminar will introduce 2D-MoSub, a derivative-free optimization method based on the subspace method and quadratic models, specifically tackling large-scale derivative-free problems. 2D-MoSub combines 2-dimensional quadratic interpolation models and trust-region techniques to update the points and explore the 2-dimensional subspace iteratively. Its framework includes constructing the interpolation set, building the quadratic interpolation model, performing trust-region trial steps, and updating the trust-region radius and subspace. Computation details and theoretical properties will be discussed. Numerical results demonstrate the advantage of 2D-MoSub.
Short Bio:
Pengcheng Xie, PhD (Chinese Academy of Sciences), is joining Lawrence Berkeley National Laboratory as a postdoctoral scholar specializing in mathematical optimization and numerical analysis. He has developed optimization methods, including 2D-MoSub and SUSD-TR. Pengcheng has published in major journals and presented at ISMP 2024 (upcoming), ICIAM 2023, and CSIAM 2022. He received the Hua Loo-keng scholarship in 2019 and the CAS-AMSS Presidential scholarship in 2023.
Structure-preserving hybrid finite element methods
Abstract
The classical finite element method uses piecewise-polynomial function spaces satisfying continuity and boundary conditions. Hybrid finite element methods, by contrast, drop these continuity and boundary conditions from the function spaces and instead enforce them weakly using Lagrange multipliers. The hybrid approach has several numerical and implementational advantages, which have been studied over the last few decades.
In this talk, we show how the hybrid perspective has yielded new insights—and new methods—in structure-preserving numerical PDEs. These include multisymplectic methods for Hamiltonian PDEs, charge-conserving methods for the Maxwell and Yang-Mills equations, and hybrid methods in finite element exterior calculus.
This seminar has been cancelled
Abstract
Joint work Marta Betcke and Bolin Pan
In photoacoustic tomography (PAT) with flat sensor, we routinely encounter two types of limited data. The first is due to using a finite sensor and is especially perceptible if the region of interest is large relatively to the sensor or located farther away from the sensor. In this talk we focus on the second type caused by a varying sensitivity of the sensor to the incoming wavefront direction which can be modelled as binary i.e. by a cone of sensitivity. Such visibility conditions result, in Fourier domain, in a restriction of the data to a bowtie, akin to the one corresponding to the range of the forward operator but further narrowed according to the angle of sensitivity.
We show how we can separate the visible and invisible wavefront directions in PAT image and data using a directional frame like Curvelets, and how such decomposition allows for decoupling of the reconstruction involving application of expensive forward/adjoint solvers from the training problem. We present fast and stable approximate Fourier domain forward and adjoint operators for reconstruction of the visible coefficients for such limited angle problem and a tailored UNet matching both the multi-scale Curvelet decomposition and the partition into the visible/invisible directions for learning the invisible coefficients from a training set of similar data.
The bilevel optimization renaissance through machine learning: lessons and challenges
Abstract
Bilevel optimization has been part of machine learning for over 4 decades now, although perhaps not always in an obvious way. The interconnection between the two topics started appearing more clearly in publications since about 20 years now, and in the last 10 years, the number of machine learning applications of bilevel optimization has literally exploded. This rise of bilevel optimization in machine learning has been highly positive, as it has come with many innovations in the theoretical and numerical perspectives in understanding and solving the problem, especially with the rebirth of the implicit function approach, which seemed to have been abandoned at some point.
Overall, machine learning has set the bar very high for the whole field of bilevel optimization with regards to the development of numerical methods and the associated convergence analysis theory, as well as the introduction of efficient tools to speed up components such as derivative calculations among other things. However, it remains unclear how the techniques from the machine learning—based bilevel optimization literature can be extended to other applications of bilevel programming.
For instance, many machine learning loss functions and the special problem structures enable the fulfillment of some qualification conditions that will fail for multiple other applications of bilevel optimization. In this talk, we will provide an overview of machine learning applications of bilevel optimization while giving a flavour of corresponding solution algorithms and their limitations.
Furthermore, we will discuss possible paths for algorithms that can tackle more complicated machine learning applications of bilevel optimization, while also highlighting lessons that can be learned for more general bilevel programs.
Multilevel Monte Carlo methods for the approximation of failure probability regions
Abstract
In this talk, we consider the problem of approximating failure regions. More specifically, given a costly computational model with random parameters and a failure condition, our objective is to determine the parameter region in which the failure condition is likely to not be satisfied. In mathematical terms, this problem can be cast as approximating the level set of a probability density function. We solve this problem by dividing it into two: 1) The design of an efficient Monte Carlo strategy for probability estimation. 2) The construction of an efficient algorithm for level-set approximation. Following this structure, this talk is comprised of two parts:
In the first part, we present a new multi-output multilevel best linear unbiased estimator (MLBLUE) for approximating expectations. The advantage of this estimator is in its convenience and optimality: Given any set of computational models with known covariance structure, MLBLUE automatically constructs a provenly optimal estimator for any (finite) number of quantities of interest. Nevertheless, the optimality of MLBLUE is tied to its optimal set-up, which requires the solution of a nonlinear optimization problem. We show how the latter can be reformulated as a semi-definite program and thus be solved reliably and efficiently.
In the second part, we construct an adaptive level-set approximation algorithm for smooth functions corrupted by noise in $\mathbb{R}^d$. This algorithm only requires point value data and is thus compatible with Monte Carlo estimators. The algorithm is comprised of a criterion for level-set adaptivity combined with an a posteriori error estimator. Under suitable assumptions, we can prove that our algorithm will correctly capture the target level set at the same cost complexity of uniformly approximating a $(d-1)$-dimensional function.
Fast optimistic methods for monotone equations and convex optimization problems
Please note; the seminar is taking place in Lecture Room 4 on this occasion
Abstract
In this talk, we discuss continuous in time dynamics for the problem of approaching the set of zeros of a single-valued monotone and continuous operator V . Such problems are motivated by minimax convexconcave and, in particular, by convex optimization problems with linear constraints. The central role is played by a second-order dynamical system that combines a vanishing damping term with the time derivative of V along the trajectory, which can be seen as an analogous of the Hessian-driven damping in case the operator is originating from a potential. We show that these methods exhibit fast convergence rates for kV (z(t))k as t ! +1, where z( ) denotes the generated trajectory, and for the restricted gap function, and that z( ) converges to a zero of the operator V . For the corresponding implicit and explicit discrete time models with Nesterov’s momentum, we prove that they share the asymptotic features of the continuous dynamics.
Extensions to variational inequalities and fixed-point problems are also addressed. The theoretical results are illustrated by numerical experiments on bilinear games and the training of generative adversarial networks.
Mathematics: key enabling technology for scientific machine learning
Abstract
Artificial Intelligence (AI) will strongly determine our future prosperity and well-being. Due to its generic nature, AI will have an impact on all sciences and business sectors, our private lives and society as a whole. AI is pre-eminently a multidisciplinary technology that connects scientists from a wide variety of research areas, from behavioural science and ethics to mathematics and computer science.
Without downplaying the importance of that variety, it is apparent that mathematics can and should play an active role. All the more so as, alongside the successes of AI, also critical voices are increasingly heard. As Robert Dijkgraaf (former director of the Princeton Institute of Advanced Studies) observed in May 2019: ”Artificial intelligence is in its adolescent phase, characterised by trial and error, self-aggrandisement, credulity and lack of systematic understanding.” Mathematics can contribute to the much-needed systematic understanding of AI, for example, greatly improving reliability and robustness of AI algorithms, understanding the operation and sensitivity of networks, reducing the need for abundant data sets, or incorporating physical properties into neural networks needed for super-fast and accurate simulations in the context of digital twinning.
Mathematicians absolutely recognize the potential of artificial intelligence, machine learning and (deep) neural networks for future developments in science, technology and industry. At the same time, a sound mathematical treatment is essential for all aspects of artificial intelligence, including imaging, speech recognition, analysis of texts or autonomous driving, implying it is essential to involve mathematicians in all these areas. In this presentation, we highlight the role of mathematics as a key enabling technology within the emerging field of scientific machine learning. Or, as I always say: ”Real intelligence is needed to make artificial intelligence work.”
ESPIRA: Estimation of Signal Parameters via Iterative Rational Approximation
Abstract
We introduce a new method - ESPIRA (Estimation of Signal Parameters via Iterative Rational Approximation) \cite{DP22, DPP21} - for the recovery of complex exponential sums
$$
f(t)=\sum_{j=1}^{M} \gamma_j \mathrm{e}^{\lambda_j t},
$$
that are determined by a finite number of parameters: the order $M$, weights $\gamma_j \in \mathbb{C} \setminus \{0\}$ and nodes $\mathrm{e}^{\lambda_j} \in \mathbb{C}$ for $j=1,...,M$. Our new recovery procedure is based on the observation that Fourier coefficients (or DFT coefficients) of exponential sums have a special rational structure. To reconstruct this structure in a stable way we use the AAA algorithm proposed by Nakatsukasa et al. We show that ESPIRA can be interpreted as a matrix pencil method applied to Loewner matrices.
During the talk we will demonstrate that ESPIRA outperforms Prony-like methods such as ESPRIT and MPM for noisy data and for signal approximation by short exponential sums.
Bibliography
N. Derevianko, G. Plonka,
Exact reconstruction of extended exponential sums using rational approximation of their Fourier coefficients, Anal. Appl., 20(3), 2022, 543-577.
N. Derevianko, G. Plonka, M. Petz,
From ESPRIT to ESPIRA: Estimation of signal parameters by iterative rational approximation, IMA J. Numer. Anal., 43(2), 2023, 789--827.
Y. Nakatsukasa, O. Sète, L.N. Trefethen, The AAA algorithm for rational approximation.
SIAM J. Sci. Comput., 40(3), 2018, A1494–A1522.
Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning
Abstract
While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes.
The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.
This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.
Stabilized Lagrange-Galerkin schemes for viscous and viscoelastic flow problems
Abstract
Many researchers are developing stable and accurate numerical methods for flow problems, which roughly belong to upwind methods or characteristics(-based) methods.
The Lagrange-Galerkin method proposed and analyzed in, e.g., [O. Pironneau. NM, 1982] and [E. S\"uli. NM, 1988] is the finite element method combined with the idea of the method of characteristics; hence, it belongs to the characteristics(-based) methods. The advantages are the CFL-free robustness for convection-dominated problems and the symmetry of the resulting coefficient matrix. In this talk, we introduce stabilized Lagrange-Galerkin schemes of second order in time for viscous and viscoelastic flow problems, which employ the cheapest conforming P1-element with the help of pressure-stabilization [F. Brezzi and J. Pitk\"aranta. Vieweg+Teubner, 1984] for all the unknown functions, i.e., velocity, pressure, and conformation tensor, reducing the number of DOFs.
Focusing on the recent developments of discretizations of the (non-conservative and conservative) material derivatives and the upper-convected time derivative, we present theoretical and numerical results.
On the use of "conventional" unconstrained minimization solvers for training regression problems in scientific machine learning
Abstract
In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods.
However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization.
In this talk, we empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.
Hierarchical adaptive low-rank format with applications to discretized PDEs
Abstract
A novel framework for hierarchical low-rank matrices is proposed that combines an adaptive hierarchical partitioning of the matrix with low-rank approximation. One typical application is the approximation of discretized functions on rectangular domains; the flexibility of the format makes it possible to deal with functions that feature singularities in small, localized regions. To deal with time evolution and relocation of singularities, the partitioning can be dynamically adjusted based on features of the underlying data. Our format can be leveraged to efficiently solve linear systems with Kronecker product structure, as they arise from discretized partial differential equations (PDEs). For this purpose, these linear systems are rephrased as linear matrix equations and a recursive solver is derived from low-rank updates of such equations.
We demonstrate the effectiveness of our framework for stationary and time-dependent, linear and nonlinear PDEs, including the Burgers' and Allen–Cahn equations.
This is a joint work with Daniel Kressner and Stefano Massei.
14:00
Algorithmic Insurance
Abstract
As machine learning algorithms get integrated into the decision-making process of companies and organizations, insurance products are being developed to protect their providers from liability risk. Algorithmic liability differs from human liability since it is based on data-driven models compared to multiple heterogeneous decision-makers and its performance is known a priori for a given set of data. Traditional actuarial tools for human liability do not consider these properties, primarily focusing on the distribution of historical claims. We propose, for the first time, a quantitative framework to estimate the risk exposure of insurance contracts for machine-driven liability, introducing the concept of algorithmic insurance. Our work provides ML model developers and insurance providers with a comprehensive risk evaluation approach for this new class of products. Thus, we set the foundations of a niche area of research at the intersection of the literature in operations, risk management, and actuarial science. Specifically, we present an optimization formulation to estimate the risk exposure of a binary classification model given a pre-defined range of premiums. Our approach outlines how properties of the model, such as discrimination performance, interpretability, and generalizability, can influence the insurance contract evaluation. To showcase a practical implementation of the proposed framework, we present a case study of medical malpractice in the context of breast cancer detection. Our analysis focuses on measuring the effect of the model parameters on the expected financial loss and identifying the aspects of algorithmic performance that predominantly affect the risk of the contract.
Paper Reference: Bertsimas, D. and Orfanoudaki, A., 2021. Pricing algorithmic insurance. arXiv preprint arXiv:2106.00839.
Paper link: https://arxiv.org/pdf/2106.00839.pdf
14:00
From Chebfun3 to RTSMS: A journey into deterministic and randomized Tucker decompositions
Abstract
In this talk we will first focus on the continuous framework and revisit how Tucker decomposition forms the foundation of Chebfun3 for numerical computing with 3D functions and the deterministic algorithm behind Chebfun3. The key insight is that separation of variables achieved via low-rank Tucker decomposition simplifies and speeds up lots of subsequent computations.
We will then switch to the discrete framework and discuss a new algorithm called RTSMS (randomized Tucker with single-mode sketching). The single-mode sketching aspect of RTSMS allows utilizing simple sketch matrices which are substantially smaller than alternative methods leading to considerable performance gains. Within its least-squares strategy, RTSMS incorporates leverage scores for efficiency with Tikhonov regularization and iterative refinement for stability. RTSMS is demonstrated to be competitive with existing methods, sometimes outperforming them by a large margin.
14:00
A strongly polynomial algorithm for the minimum-cost generalized flow problem
Abstract
We give a strongly polynomial algorithm for minimum cost generalized flow, and as a consequence, for all linear programs with at most two nonzero entries per row, or at most two nonzero entries per column. While strongly polynomial algorithms for the primal and dual feasibility problems have been known for a long time, various combinatorial approaches used for those problems did not seem to carry over to the minimum-cost variant.
Our approach is to show that the ‘subspace layered least squares’ interior point method, an earlier joint work with Allamigeon, Dadush, Loho and Natura requires only a strongly polynomial number of iterations for minimum cost generalized flow. We achieve this by bounding the straight line complexity, introduced in the same paper. The talk will give an overview of the interior point method as well as the combinatorial straight-line complexity analysis for the particular setting. This is joint work with Daniel Dadush, Zhuan Khye Koh, Bento Natura, and Neil Olver.