▲ | jazzyjackson 5 days ago | |
Probably not accountant mode but haven't they always had daily quotas that get used up? Like they don't want everyone hitting the service nonstop because they don't have enough GPUs to run inference at peak times of day? So it could be a matter of serving more highly quantized model because giving bad results has higher user retention than "try again later" |