Remix.run Logo
petu 2 hours ago

> What makes the guess "right"?

Matching token that would've been picked without speculative decoding. That seems to be more or less agreed upon.

e.g. vLLM docs list tests they run to ensure that output doesn't change if spec. decoding is used: https://github.com/vllm-project/vllm/blob/main/docs/features...

But introducing some threshold to accept other high probability tokens is interesting idea.