Well I may be misunderstanfing you, but speculative decoding is just using next token predictions from a few models(or many samples from one model) instead of just one sample. It is still next token prediction.

Tool usage is also just next token prediction. You have it predict the next token of the syntax needed for tool use, and then it is fed the result of that in context which it then predicts the next token of.

▲

astrange 2 days ago | parent [-]

The text returned by the tool itself makes it not "next token prediction". Aside from having side effects, the reason it's helpful is that it's out of distribution for the model. So it changes the properties of the system.

	▲	SEGyges 12 hours ago \| parent \| next [-]
		This is true of the system as a whole, but the core neural network is still a next-token predictor.
	▲	porridgeraisin 2 days ago \| parent \| prev [-]
		Ah ok, understood what you meant.