Remix clone Hacker News

	▲	brookst 14 hours ago
		Thanks for the fantastic explanation! Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?