Thu, 23 Nov 2023
Lecture Room 3
Oliver Hinder
University of Pittsburgh

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for the step size of stochastic gradient descent (SGD), and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.

Additionally, we present theoretical and numerical results for a dynamic step size schedule for SGD based on a variant of this idea. On a broad range of vision and language transfer learning tasks our methods performance is close to that of SGD with tuned learning rate. Also, a per-layer variant of our algorithm approaches the performance of tuned ADAM.

This talk is based on papers with Yair Carmon and Maor Ivgi.

Please contact us with feedback and comments about this page. Last updated on 02 Oct 2023 13:59.