Remix clone Hacker News

new | show | ask | jobs Github

	▲	Lerc 2 hours ago
		If you think of it from the point of view of the universal approximation theorem, it's all efficiency optimisation. We know that it works if we do it incredibly inefficiently. Every architecture improvement is essentially a way to achieve the capability of a single fully-connected hidden layer network n wide. With fewer parameters. Given these architectures usually still contain fully connected layers, unless they've done something really wrong, they should still be able to do anything if you make the entire thing large enough. That means a large enough [insert model architecture] will be able to approximate any function to arbitrary precision. As long as the efficiency gains with the architecture are retained as the scale increases they should be able to get there quicker.