▲ | computomatic 7 days ago | ||||||||||||||||||||||
This is great product design at its finest. First of all, they never “handle more requests than they have hardware.” That’s impossible (at least as I’m reading it). The vast majority of usage is via their web app (and free accounts, at that). The web app defaults to “auto” selecting a model. The algorithm for that selection is hidden information. As load peaks, they can divert requests to different levels of hardware and less resource hungry models. Only a very small minority of requests actually specify the model to use. There are a hundred similar product design hacks they can use to mitigate load. But this seems like the easiest one to implement. | |||||||||||||||||||||||
▲ | addaon 7 days ago | parent [-] | ||||||||||||||||||||||
> But this seems like the easiest one to implement. Even easier: Just fail. In my experience the ChatGPT web page fails to display (request? generate?) a response between 5% and 10% of the time, depending on time of day. Too busy? Just ignore your customers. They’ll probably come back and try again, and if not, well, you’re billing them monthly regardless. | |||||||||||||||||||||||
|