Speaker Edward Tansley will talk about: 'Low-rank functions in machine learning'
Functions that vary along a low-dimensional subspace of their input space, often called multi-index or low-rank functions, frequently arise in machine learning. Understanding how such structure emerges can provide insight into the learning dynamics of neural networks. One line of work that explores how networks learn low-rank data representations is the Neural Feature Ansatz (NFA), which states that after training, the Gram matrix of the first-layer weights of a deep network is proportional to some power of the average gradient outer product (AGOP) of the network with respect to its inputs. Existing results prove this relationship for 2-layer linear networks under balanced initialization. In this work, we extend these results to general L-layer linear networks and remove the assumption of balanced initialization for networks trained with weight decay.