Remix.run Logo
johnsmith1840 2 hours ago

This has been happening for years. Tgere's a great paper from microsoft on Deepspeed AI inference.

Basically the paper showed methods for how to handle heavy traffic load by changing model requirements or routing to different ones. This was awhile ago and I'm sure it's massively more advanced now.

Also why some of AI's best work for me is early morning and weekends! So yes, the best time to code with modern LLM stacks is when nobody else is. It's also possibly why we go through phases of "they neutered the model" some time after a new release.