Remix.run Logo
dofm 10 hours ago

Well, many of us who shared hardware also ran monitoring to make sure the share was fair; there used to be a whole industry for that sort of quota stuff.

You can presumably hard-limit LLMs the same way — total, burst quotas etc.

(Suddenly getting a very fun flashback to the environment in which someone first explained Markov chains to me — MediaMOO. A text-based chat environment with configurable limits on the number of CPU "ticks" you were allowed in order to do things)