Remix clone Hacker News

new | show | ask | jobs Github

	▲	londons_explore 3 days ago
		Whilst there aren't many papers on the matter, I would guess that pretraining from scratch is a bit of a waste of money when you could simply expand the depth/width of the 'old' model and retrain only the 'new' bit.