Remix.run Logo
zozbot234 40 minutes ago

Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding.