Remix clone Hacker News

new | show | ask | jobs Github

	▲	VHRanger 3 days ago
		In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively). The issue is that they're generally harder to train (need input/output pairs as a training dataset) and don't naturally generalize as well
	▲	GaggiX 3 days ago \| parent [-]
		≥In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively). Decoder-only models also do this, the only difference is that they use a masked attention.