Universal approximation with deep narrow networks


Kidger, P
Lyons, T

Publication Date: 

6 August 2020


Proceedings of the 33rd Annual Conference on Learning Theory (COLT 2020)

Last Updated: 









The classical Universal Approximation Theorem certifies that the universal
approximation property holds for the class of neural networks of arbitrary
width. Here we consider the natural `dual' theorem for width-bounded networks
of arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be
the number of output neurons, and let $\rho$ be any nonaffine continuous
function, with a continuous nonzero derivative at some point. Then we show that
the class of neural networks of arbitrary depth, width $n + m + 2$, and
activation function $\rho$, exhibits the universal approximation property with
respect to the uniform norm on compact subsets of $\mathbb{R}^n$. This covers
every activation function possible to use in practice; in particular this
includes polynomial activation functions, making this genuinely different to
the classical case. We go on to establish some natural extensions of this
result. Firstly, we show an analogous result for a certain class of nowhere
differentiable activation functions. Secondly, we establish an analogous result
for noncompact domains, by showing that deep narrow networks with the ReLU
activation function exhibit the universal approximation property with respect
to the $p$-norm on $\mathbb{R}^n$. Finally, we show that width of only $n + m +
1$ suffices for `most' activation functions (whilst it is known that width of
$n + m - 1$ does not suffice in general).

Symplectic id: 


Submitted to ORA: 


Publication Type: 

Conference Paper