Neural network architectures play a key role in determining which functions are fit to training data and the resulting generalization properties of learned predictors. For instance, imagine training an overparameterized neural network to interpolate a set of training samples using weight decay; the network architecture will influence which interpolating function is learned.
In this talk, I will describe new insights into the role of network depth in machine learning using the notion of representation costs – i.e., how much it “costs” for a neural network to represent some function f. Understanding representation costs helps reveal the role of network depth in machine learning. First, we will see that there is a family of functions that can be learned with depth-3 networks when the number of samples is polynomial in the input dimension d, but which cannot be learned with depth-2 networks unless the number of samples is exponential in d. Furthermore, no functions can easily be learned with depth-2 networks while being difficult to learn with depth-3 networks.
Together, these results mean deeper networks have an unambiguous advantage over shallower networks in terms of sample complexity. Second, I will show that adding linear layers to a ReLU network yields a representation cost that favors functions with latent low-dimension structure, such as single- and multi-index models. Together, these results highlight the role of network depth from a function space perspective and yield new tools for understanding neural network generalization.