Remix.run Logo
jmalicki 2 hours ago

The properties that the uniform approximation theorem proves are not unique to neural networks.

Any models using an infinite dimensional Hilbert space, such as SVMs with RBF or polynomial kernels, Gaussian process regression, gradient boosted decision trees, etc. have the same property (though proven via a different theorem of course).

So the universal approximation theorem tells us nothing about why should expect neural networks to perform better than those models.

hodgehog11 an hour ago | parent [-]

Extremely well said. Universal approximation is necessary but not sufficient for the performance we are seeing. The secret sauce is implicit regularization, which comes about analogously to enforcing compression.