Remix clone Hacker News

new | show | ask | jobs Github

	▲	laidoffamazon 3 days ago
		Interesting. My assumption was one of the innovations of DeepSeek and the modern GPT models was performing low precision pretraining rather than just finetuning further. I didn't realize you still need accumulation at a higher precision anyway