▲ | mh- 7 days ago | |
Yes. And batched inference is a thing, where intelligent grouping/bin packing and routing of requests happens. I expect a good amount of "secret sauce" is at this layer. Here's an entry-level link I found quickly on Google, OP: https://medium.com/@wearegap/a-brief-introduction-to-optimiz... |