Practical considerations for measuring the effective reproductive number, Rt
Gostic, K McGough, L Baskerville, E Abbott, S Joshi, K Tedijanto, C Kahn, R Niehus, R Hay, J De Salazar, P Hellewell, J Meakin, S Munday, J Bosse, N Sherrat, K Thompson, R White, L Huisman, J Scire, J Bonhoeffer, S Stadler, T Wallinga, J Funk, S Lipsitch, M Cobey, S PLoS Computational Biology volume 16 issue 12 (10 Dec 2020)
A Randomized Algorithm to Reduce the Support of Discrete Measures.
Cosentino, F Oberhauser, H Abate, A NeurIPS (01 Jan 2020)
Gibbs flow for approximate transport with applications to Bayesian computation
Heng, J Doucet, A Pokern, Y Journal of the Royal Statistical Society: Series B (Statistical Methodology) volume 83 issue 1 156-187 (20 Jan 2021)
SafePILCO: A Software Tool for Safe and Data-Efficient Policy Synthesis
Polymenakos, K Rontsis, N Abate, A Roberts, S Lecture Notes in Computer Science volume 12289 18-26 (03 Nov 2020)
Robustness and Stability of Spin Glass Ground States to Perturbed Interactions
Mohanty, V Louis, A (09 Dec 2020)
REACT-1 round 7 interim report: fall in prevalence of swab-positivity in England during national lockdown
Riley, S Eales, O Walters, C Wang, H Ainslie, K Atchison, C Fronterre, C Diggle, P Ashby, D Donnelly, C Cooke, G Barclay, W Ward, H Darzi, A Elliott, P 2020.11.30.20239806 (02 Dec 2020)
Tue, 23 Feb 2021
14:00
Virtual

Dense for the price of sparse: Initialising deep nets with efficient sparse affine transforms

Ilan Price
(Mathematical Institute)
Abstract

That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new `DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.

--

A link for this talk will be sent to our mailing list a day or two in advance.  If you are not on the list and wish to be sent a link, please contact @email.

Subscribe to