Remix clone Hacker News

new | show | ask | jobs Github

	▲	moffkalast 5 hours ago
		> trained from scratch on 80B tokens of historical data How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?