Remix clone Hacker News

new | show | ask | jobs Github

	▲	deskamess 2 hours ago
		Oh... so MTP is not speculative decoding? The (T)oken (P)rediction made me think it was on the inference side. I shall read the paper. Edit: Ok, I understand now. You are saying that MTP has two aspects. 1) The training (for the mini-models to generate tokens), and 2) The actual speculative decoding implementation on the inference side (which uses those trained mini-models).