Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

oceansky 8 hours ago | parent | next [-]

Yes. It's post training in qwen using the novel SwiReasoning framework.

I hadn't seen SwiReasoning (https://swireasoning.github.io, paper and code), it looks like that works at generation time without any requirements on the model. It increases token-efficiency and accuracy, but at first skim it seems like this would be incompatible with multi-token prediction. For large reductions in token budget it could be worth it.

▲

rafaquintanilha 7 hours ago | parent [-]

Doesn't look like it's incompatible. Someone already released a quantization using MTP: https://huggingface.co/foxipanda/Rio-3.5-Open-397B-GGUF

	▲	hedgehog 6 hours ago \| parent [-]
		As I understand it the basic premise of all the speculative decoding schemes is that the logits on the draft don't need to be exact so long as you mostly sample the same tokens, and because each position is fed by the embedding associated with the previous position's token you sort of "round away" error. With SwiReasoning I think you skip the sampling/rounding part and do something continuous using the whole distribution, so it would seem to rely on the accuracy of those values. MTP still makes sense outside the latent reasoning chunks though.

▲

Kelteseth 8 hours ago | parent | prev [-]

Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)

	▲	drnick1 7 hours ago \| parent [-]
		Same thing here, X content and trackers are blocked by my Firefox settings. The occasional inconvenience is a small price to pay not to be profiled by X, Google, FB, Amazon, and countless other Internet parasites.