Remix.run Logo
WhitneyLand 6 hours ago

Yeah important conceptually to remember MTP is kind of just more weights, but speculative decoding is the runtime algorithm that’s a significant add to whatever code is serving the model.

HumanOstrich 5 hours ago | parent [-]

That is.. inaccurate.

WhitneyLand 2 hours ago | parent [-]

How so? I’m not saying most of work doesn’t go into creating the drafting model or enabling a new head on the primary model, but the point is that however cool it is the result is, more weights. Speculative decoding requires code to be aware of how this works at the inference level.