Proceedings of ATMCS10
Tillmann, U Carlsson, G Journal of Applied and Computational Topology volume 8 issue 3 443-445 (01 Sep 2024)
Competing effects in fourth‐order aggregation–diffusion equations
Antonio Carrillo, J Esposito, A Falcó, C Fernández‐Jiménez, A Proceedings of the London Mathematical Society volume 129 issue 2 (22 Aug 2024)
Mon, 25 Nov 2024

14:00 - 15:00
Lecture Room 3

Ease-controlled Incremental Gradient- type Algorithm for nonconvex finite sum optimization

Laura Palagi
(Sapienza University of Rome)
Abstract

We consider minimizing the sum of a large number of smooth and possibly non-convex functions, which is the typical problem encountered in the training of deep neural networks on large-size datasets. 

Improving the Controlled Minibatch Algorithm (CMA) scheme proposed by Liuzzi et al. (2022), we propose CMALight, an ease-controlled incremental gradient (IG)-like method. The control of the IG iteration is performed by means of a costless watchdog rule and a derivative-free line search that activates only sporadically to guarantee convergence. The schemes also allow controlling the updating of the learning rate used in the main IG iteration, avoiding the use of preset rules, thus overcoming another tricky aspect in implementing online methods.

Convergence to a stationary point holds under the lonely assumption of Lipschitz continuity of the gradients of the component functions without knowing the Lipschitz constant or imposing any growth assumptions on the norm of the gradients.

We present two sets of computational tests. First, we compare CMALight against state-of-the-art mini-batch algorithms for training standard deep networks on large-size datasets, and deep convolutional neural networks and residual networks on standard image classification tasks on CIFAR10 and CIFAR100. 

Results shows that CMALight easily scales up to problem with order of millions  variables and has an advantage over its state-of-the-art competitors.

Finally, we present computational results on generative tasks, testing CMALight scaling capabilities on image generation with diffusion models (U-Net architecture). CMA Light achieves better test performances and is more efficient than standard SGD with weight decay, thus reducing the computational burden (and the carbon footprint of the training process).

Laura Palagi, @email

Department of Computer, Control and Management Engineering,

Sapienza University of Rome, Italy

 

Joint work with 

Corrado Coppola, @email

Giampaolo Liuzzi, @email

Lorenzo Ciarpaglini, @email

 

 

Mon, 11 Nov 2024

14:00 - 15:00
Lecture Room 3

Understanding the learning dynamics of self-predictive representation learning

Yunhao Tang
(Google Deep Mind)
Abstract

Self-predictive learning (aka non-contrastive learning) has become an increasingly important paradigm for representation learning. Self-predictive learning is simple yet effective: it learns without contrastive examples yet extracts useful representations through a self-predicitve objective. A common myth with self-predictive learning is that the optimization objective itself yields trivial representations as globally optimal solutions, yet practical implementations can produce meaningful solutions. 

 

We reconcile the theory-practice gap by studying the learning dynamics of self-predictive learning. Our analysis is based on analyzing a non-linear ODE system that sheds light on why despite a seemingly problematic optimization objective, self-predictive learning does not collapse, which echoes with important implementation "tricks" in practice. Our results also show that in a linear setup, self-predictive learning can be understood as gradient based PCA or SVD on the data matrix, hinting at meaningful representations to be captured through the learning process.

 

This talk is based on our ICML 2023 paper "Understanding self-predictive learning for reinforcement learning".

Mon, 04 Nov 2024

14:00 - 15:00
Lecture Room 3

Efficient high-resolution refinement in cryo-EM with stochastic gradient descent

Bogdan Toader
(MRC Laboratory of Molecular Biology Cambridge Biomedical Campus)
Abstract

Electron cryomicroscopy (cryo-EM) is an imaging technique widely used in structural biology to determine the three-dimensional structure of biological molecules from noisy two-dimensional projections with unknown orientations. As the typical pipeline involves processing large amounts of data, efficient algorithms are crucial for fast and reliable results. The stochastic gradient descent (SGD) algorithm has been used to improve the speed of ab initio reconstruction, which results in a first, low-resolution estimation of the volume representing the molecule of interest, but has yet to be applied successfully in the high-resolution regime, where expectation-maximization algorithms achieve state-of-the-art results, at a high computational cost. 
In this work, we investigate the conditioning of the optimisation problem and show that the large condition number prevents the successful application of gradient descent-based methods at high resolution. 
Our results include a theoretical analysis of the condition number of the optimisation problem in a simplified setting where the individual projection directions are known, an algorithm based on computing a diagonal preconditioner using Hutchinson's diagonal estimator, and numerical experiments showing the improvement in the convergence speed when using the estimated preconditioner with SGD. The preconditioned SGD approach can potentially enable a simple and unified approach to ab initio reconstruction and high-resolution refinement with faster convergence speed and higher flexibility, and our results are a promising step in this direction.

Wed, 24 Jul 2024
11:00
L5

Dehn functions of nilpotent groups

Jerónimo García-Mejía
(KIT)
Abstract

Since Gromov's celebrated polynomial growth theorem, the understanding of nilpotent groups has become a cornerstone of geometric group theory. An interesting aspect is the conjectural quasiisometry classification of nilpotent groups. One important quasiisometry invariant that plays a significant role in the pursuit of classifying these groups is the Dehn function, which quantifies the solvability of the world problem of a finitely presented group. Notably, Gersten, Holt, and Riley's work established that the Dehn function of a nilpotent group of class $c$ is bounded above by $n^{c+1}$.  

In this talk, I will explain recent results that allow us to compute Dehn functions for extensive families of nilpotent groups arising as central products. Consequently, we obtain a large collection of pairs of nilpotent groups with bilipschitz equivalent asymptotic cones but with different Dehn functions.

This talk is based on joint work with Claudio Llosa Isenrich and Gabriel Pallier.

Subscribe to