Remix.run Logo
jeeeb 8 hours ago

> It is also a bit weird that they are not incorporating speculative decoding

Wouldn’t speculative decoding decrease overall throughput, but optimise (perceived) responsiveness?

YetAnotherNick 8 hours ago | parent [-]

For compute bound region(high batch size) yes, but for low batch size it could improve the throughput.