| ▲ | jeeeb 8 hours ago | |
> It is also a bit weird that they are not incorporating speculative decoding Wouldn’t speculative decoding decrease overall throughput, but optimise (perceived) responsiveness? | ||
| ▲ | YetAnotherNick 8 hours ago | parent [-] | |
For compute bound region(high batch size) yes, but for low batch size it could improve the throughput. | ||