### Fantastic Sparse Neural Networks and Where to Find Them

## Abstract

Sparse neural networks, where a substantial portion of the components are eliminated, have widely shown their versatility in model compression, robustness improvement, and overfitting mitigation. However, traditional methods for obtaining such sparse networks usually involve a fully pre-trained, dense model. As foundation models become prevailing, the cost of this pre-training step can be prohibitive. On the other hand, training intrinsic sparse neural networks from scratch usually leads to inferior performance compared to their dense counterpart.

In this talk, I will present a series of approaches to obtain such fantastic sparse neural networks by training from scratch without the need for any dense pre-training steps, including dynamic sparse training, static sparse with random pruning, and only masking no training.* First*, I will introduce the concept of in-time over-parameterization (ITOP) (ICML2021) which enables training sparse neural networks from scratch (commonly known as * sparse training*) to attain the full accuracy of dense models. By dynamically exploring new sparse topologies during training, we avoid the costly necessity of pre-training and re-training, requiring only a single training run to obtain strong sparse neural networks.

*Secondly*, ITOP involves additional overhead due to the frequent change in sparse topology. Our following work (ICLR2022) demonstrates that even a naïve, static sparse network produced by random pruning can be trained to achieve dense model performance as long as our model is relatively larger. Moreover, I will further discuss that we can continue to push the extreme of training efficiency by only learning masks at initialization without any weight updates, addressing the over-smoothing challenge in building deep graph neural networks (LoG2022).