▲ | astrange 2 days ago | ||||||||||||||||||||||
Inference often isn't next token prediction though, either weakly (because of speculative decoding/multiple token outputs) or strongly (because of tool usage like web search). | |||||||||||||||||||||||
▲ | porridgeraisin 2 days ago | parent [-] | ||||||||||||||||||||||
Well I may be misunderstanfing you, but speculative decoding is just using next token predictions from a few models(or many samples from one model) instead of just one sample. It is still next token prediction. Tool usage is also just next token prediction. You have it predict the next token of the syntax needed for tool use, and then it is fed the result of that in context which it then predicts the next token of. | |||||||||||||||||||||||
|