| ▲ | amluto 3 hours ago | |||||||
How useful is speculative decoding in a batched setting where you get paid for throughput (aggregated across users) and you mostly don’t get paid for latency or single-session throughput? | ||||||||
| ▲ | onlyrealcuzzo 3 hours ago | parent [-] | |||||||
It's useful at the local level, where there will be SOTA models developed... | ||||||||
| ||||||||