A Mathematical Perspective on Transformers

Seminar series

Machine Learning and Data Science Seminar

Date

Fri, 13 Jun 2025

Time

11:00 - 12:00

Location

Lecture Room 3

Speaker

Prof Philippe Rigollet

Organisation

Massachusetts Institute of Technology, USA

Since their introduction in 2017, Transformers have revolutionized large language models and the broader field of deep learning. Central to this success is the ground-breaking self-attention mechanism. In this presentation, I’ll introduce a mathematical framework that casts this mechanism as a mean-field interacting particle system, revealing a desirable long-time clustering behaviour. This perspective leads to a trove of fascinating questions with unexpected connections to Kuramoto oscillators, sphere packing, Wasserstein gradient flows, and slow dynamics.

Bio: Philippe Rigollet is a Distinguished Professor of Mathematics at MIT, where he serves as Chair of the Applied Math Committee and Director of the Statistics and Data Science Center. His research spans multiple dimensions of mathematical data science, including statistics, machine learning, and optimization, with recent emphasis on optimal transport and its applications. See https://math.mit.edu/~rigollet/ for more information.

This talk is hosted by the AI Reading Group