Ease-controlled Incremental Gradient- type Algorithm for nonconvex finite sum optimization
Abstract
We consider minimizing the sum of a large number of smooth and possibly non-convex functions, which is the typical problem encountered in the training of deep neural networks on large-size datasets.
Improving the Controlled Minibatch Algorithm (CMA) scheme proposed by Liuzzi et al. (2022), we propose CMALight, an ease-controlled incremental gradient (IG)-like method. The control of the IG iteration is performed by means of a costless watchdog rule and a derivative-free line search that activates only sporadically to guarantee convergence. The schemes also allow controlling the updating of the learning rate used in the main IG iteration, avoiding the use of preset rules, thus overcoming another tricky aspect in implementing online methods.
Convergence to a stationary point holds under the lonely assumption of Lipschitz continuity of the gradients of the component functions without knowing the Lipschitz constant or imposing any growth assumptions on the norm of the gradients.
We present two sets of computational tests. First, we compare CMALight against state-of-the-art mini-batch algorithms for training standard deep networks on large-size datasets, and deep convolutional neural networks and residual networks on standard image classification tasks on CIFAR10 and CIFAR100.
Results shows that CMALight easily scales up to problem with order of millions variables and has an advantage over its state-of-the-art competitors.
Finally, we present computational results on generative tasks, testing CMALight scaling capabilities on image generation with diffusion models (U-Net architecture). CMA Light achieves better test performances and is more efficient than standard SGD with weight decay, thus reducing the computational burden (and the carbon footprint of the training process).
Laura Palagi, @email
Department of Computer, Control and Management Engineering,
Sapienza University of Rome, Italy
Joint work with
Corrado Coppola, @email
Giampaolo Liuzzi, @email
Lorenzo Ciarpaglini, @email
Understanding the learning dynamics of self-predictive representation learning
Abstract
Self-predictive learning (aka non-contrastive learning) has become an increasingly important paradigm for representation learning. Self-predictive learning is simple yet effective: it learns without contrastive examples yet extracts useful representations through a self-predicitve objective. A common myth with self-predictive learning is that the optimization objective itself yields trivial representations as globally optimal solutions, yet practical implementations can produce meaningful solutions.
We reconcile the theory-practice gap by studying the learning dynamics of self-predictive learning. Our analysis is based on analyzing a non-linear ODE system that sheds light on why despite a seemingly problematic optimization objective, self-predictive learning does not collapse, which echoes with important implementation "tricks" in practice. Our results also show that in a linear setup, self-predictive learning can be understood as gradient based PCA or SVD on the data matrix, hinting at meaningful representations to be captured through the learning process.
This talk is based on our ICML 2023 paper "Understanding self-predictive learning for reinforcement learning".