Remix clone Hacker News

	▲	simsla a day ago
		Avoids vanishing gradients in deeper networks. Also, most blocks with a residual approximate the identity function when initialised, so tend to be well behaved.