Abstracts

Nina Balcan

Title: Machine Learning and Algorithm Design

Abstract: The classic theory of computing approach to designing and analyzing algorithms considers hand-designed algorithms and focuses on worst-case guarantees. Since such hand-designed algorithms have weak worst-case guarantees for many problems, in practice machine learning components are often incorporated in algorithm design. In this talk, I will describe recent work in our group that provides theoretical foundations for such learning augmented algorithms. I will describe both specific case studies (from data science to operations research to computational economics) and general principles applicable broadly to a variety of combinatorial algorithmic problems. I will also discuss how we can loop back and use these tools to learn machine learning algorithms themselves.

Peter Bartlett

Title: Modern machine learning: New directions in optimization theory and statistical theory

Abstract: Modern machine learning has revealed some major surprises from the perspective of theory. These methods seem to achieve their outstanding performance through different mechanisms from those of classical learning theory, mathematical statistics, and optimization theory. Optimization in deep learning relies on simple gradient descent algorithms that are traditionally viewed as a time discretization of gradient flow. However, in practice, large step sizes - large enough to cause oscillation of the loss - exhibit performance advantages. Without any explicit effort to control model complexity, gradient methods exhibit excellent prediction performance in practice. This talk will describe recent progress on the optimization and generalization properties of these methods.

Moses Charikar

Title: How to appease a majority?

In 1785, Condorcet established a frustrating property of elections and majority rule: it is possible that, no matter which candidate you pick as the winner, a majority of voters will prefer someone else. You might have the brilliant idea of picking a small set of winners instead of just one, but how do you avoid the nightmare scenario where a majority of the voters prefer some other candidate over all the ones you picked? How many candidates suffice to appease a majority of the voters? In this talk, I will answer this question. Along the way, we will roll some dice -- both because the analysis involves randomness and because of a connection to the curious phenomenon of intransitive dice, that has delighted recreational and professional mathematicians alike, ever since Martin Gardener popularized it in1970.

Cornelia Drutu

Title: Property (T) and expanders, a-T-menability and median graphs/cut metrics

This talk will overview two opposite properties of infinite groups and their connections to two families of graphs. More precisely, property (T) relates to expander graphs (robust networks) in many ways, while a-T-menability is closely connected with median graphs (economic networks), and more generally cut metrics. This latter connection turns out to be fruitful, for instance in the solution of the Goemans-Linial conjecture (relevant in the Sparsest Cut Problem with general demands) using the geometry of Heisenberg groups.

Giles Gardam

Title: SAT for the working mathematician

Abstract: I will discuss solvers for the Boolean satisfiability problem (SAT) as a tool for the working mathematician. SAT is NP-complete which means it is (in theory) broadly applicable but (in theory) very difficult, however modern SAT solvers are able to solve remarkably large problems in practice. An example of this from my own experience is producing a counterexample to the unit conjecture for group rings, which had remained open for 80 years.

Andras Juhasz

Title: Reinforcement learning and knot theory

Abstract: Several hard problems in knot theory can be formulated as single-player games, including computing the unknotting number and the 4-ball genus, making these amenable to reinforcement learning (RL) algorithms. There has been substantial progress over the past few years, but there is still a gap between what is possible using RL and more classical methods. I will discuss some of the latest results and remaining challenges.

Anders Karlsson

Title: The Pervasive Role of Composing Transformations in Machine Learning

Abstract: From the layer maps of neural networks to training procedures and reinforcement learning, compositions of transformations permeate modern AI. These compositional products often involve randomly selected maps, as in weight initialization, stochastic gradient descent (SGD), and dropout. In reinforcement learning, Bellman-type operators with randomness are iterated to update reward structures and strategies. I will discuss the mathematics and geometry underlying the composition of random transformations. In particular, I will explain a general limit law established in joint work with Gouëzel. Moreover, I will discuss a possible cut-off phenomenon related to the depth of neural networks and the influence of iteration order. Motivated by these observations, and in collaboration with Avelin, Dherin, Gonzalvo, Mazzawi, and Munn, we propose backward variants of SGD that improve stability and convergence while maintaining generalisation.

David Kielak

Title: Gradient descent and Property T

I will explain how gradient descent can be used to prove Property T for groups.

Alex Lubotzky

Title: Good locally testable codes

Abstract: An error-correcting code is locally testable (LTC) if a random tester reads only a small number of bits of a given word and decides whether the word is in the code, or at least close to it. A long-standing problem asks if there exists such a code that also satisfies the golden standards of coding theory: constant rate and constant distance. Unlike the classical situation in coding theory, random codes are not LTC, so this problem is a challenge of a new kind. We construct such codes based on what we call (Ramanujan) Left/Right Cayley square complexes. These objects seem to be of independent group-theoretic interest. The codes built on them are 2-dimensional versions of the expander codes constructed by Sipser and Spielman (1996). Based on joint work with I. Dinur, S. Evra, R. Livne, and S. Mozes.

Shahar Mendelson

Title: How to overcome the shortcomings of the empirical mean
Abstract: the empirical mean - i.e., the average of the data points one is given - is of obvious importance; for one, it is the most natural guess of the true mean of the sampled object. Unfortunately and perhaps surprisingly, it is often a terrible guess. As a result, there are standard procedures that are based on the empirical mean and are used throughout data science (e.g. empirical risk minimization) whose performance is far from optimal. In this talk I will try to pinpoint the reasons for the suboptimality of the empirical mean and suggest alternatives. I will focus on one indicative example: uniform $L_p$ estimation in a subset of $S^{d-1}$. More accurately, for an isotropic, log-concave random vector $X$ in $R^d$, a set $T \subset S^{d-1}$, and $p> 2$, the question is to find a procedure $\Phi$ that, given $N$ independent copies of $X$ - $X_1,...,X_N$, returns for every $t \in T$ its guess of $E|<X,t>|^p$. The performance of the procedure is measured by the largest error in $T$: $\sup_{t \in T} | \Phi(X_1,...,X_N,t)-E|<X,t>|^p|$. As it happens, the empirical mean $\frac{1}{N}\sum_{i=1}^N |<X_i,t>|^p$ is a bad alternative, but identifying a better one is nontrivial. The solution I will outline opens the door to similar improvements in a variety of problems in data science (joint work with D. Bartl).

Ryan O’Donnell

Title: High-dimensional expanders from group theory... and computer tools

Abstract: In theoretical computer science, an increasingly important role is being played by sparse high-dimensional expanders (HDXs). In turn, the only constructions of these objects that we know are based on groups of Lie type. Most commonly studied are the HDXs based on "A_d-type" groups, yet recent PCPs breakthroughs were only unlocked via the topological properties of more exotic, "C_d-type" HDXs. Motivated by this, we prove new topological properties ("cosystolic/coboundary expansion") of coset complex HDXs over non-A_d-type Chevalley groups. I will describe some of the math, but I will also tell some stories about the computer tools involved. These stories will include:

. Martha Evens's SNOBOL code for her friend's 1979 Northwestern PhD thesis

. The Mayor of Evanston, aged 19

. A ChatGPT video game for solving the Word Problem on groups

. Calculating the rank of a 1million x 2million matrix

. A computer-drawn 'movie' of surface triangulation

. Getting some undergrads to formalize 70 pages of math in Lean

Based on joint work with Noah G. Singer (Carnegie Mellon).

Fedor Pavutnitskiy

Title: LLMs for the Working Mathematician

Abstract: In this introductory talk I will survey foundational and recent works in the emerging area of using large language models (LLMs) in contemporary mathematical research. I will also discuss key challenges we have faced and share progress from our own ongoing projects.

Laura Ciobanu Radomirovic

Title: Word equations, constraints and string solvers

Abstract: In this talk I will give an overview of word equations in free monoids and groups that are `free-like’. I will then discuss how one can express the solutions to word equations as formal languages, and touch upon the (un)decidability of solving word equations with a variety of algebraic and combinatorial constraints. The constraints stem from both theoretical and practical settings, the latter coming from the world of string solvers. If time allows, I will mention recent efforts to solve word equations with reinforcement learning and SMT solvers. This is based on work with V. Diekert and M. Elder, separately with G. Zetzsche, and with/of Albert Garreta.

Lawrence Saul

Title: A curious correspondence between sparse and low-rank matrices and its myriad practical uses

Abstract: We consider when a sparse nonnegative matrix can be recovered, via a simple elementwise nonlinearity, from a real-valued matrix of significantly lower rank. We show that this question arises naturally in many problems of high dimensional data analysis, and for a particular choice of nonlinearity, we describe an algorithm, known as subzero matrix completion, to discover these low-rank representations. As illustrative examples, we use the algorithm to analyze the synaptic weight matrix of the fruit-fly connectome and the co-occurence statistics of words in natural language. Finally, we discuss the challenges of scaling this algorithm to very large matrices, as well as recent progress on overcoming these challenges. (Affiliation: Flatiron Institute)

Eric Waingarten

Title: Efficient Algorithms for the Earth Mover's Distance from Average Distortion Embeddings

Abstract: We give new algorithmic primitives for the Earth Mover's Distance (EMD), and as a result, improve the best approximation for nearest neighbor search under EMD by a quadratic factor. Here, the metric EMD_s(R^d,\ell_p) consists of sets of s vectors in R^d, and for any two sets x,y of s vectors the distance EMD(x,y) is the minimum cost of a perfect matching between x,y, where the cost of matching two vectors is their \ell_p distance. Previously, Andoni, Indyk, and Krauthgamer gave a bi-Lipschitz embedding of EMD_s([Delta]^d,\ell_p) when p in [1,2] to l_1 with approximation O(log s log(dDelta)). By using "data-dependent" trees, we give average-distortion embeddings with distortion \tilde{O}(log s).

Shmuel Weinberger

Title: Naive Reflections on the Evolution of Algorithms

Abstract: A common theme views evolution as a kind of iterative dynamical process that attempts to optimize (or even learns an optimizer) according to a loss or fitness function. This talk will describe some very preliminary ideas about one concrete model of this, some of the mathematics it results in (which lie in quantitative topology), some of the directions where it is lacking (and there are many) and possible next steps.

Standa Zivny

Title: On Relaxations of Satisfiability

Abstract: In this talk, I’ll report on recent results on two types of relaxations of satisfiability for combinatorial problems, namely satisfiability via operators for constraint satisfaction problems, and satisfiability with weaker constraints. A part of the talk is based on joint work with Andrei Bulatov (SFU).