Remix clone Hacker News

new | show | ask | jobs Github

	▲	highfrequency 6 days ago
		Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!