Remix.run Logo
Illniyar 11 hours ago

I think the logistics of calculating cost in real time is something that is extremely hard. I don't think there is one big cloud service provider that has hard limits instead of alerts.

As long as they revert the charge when notified of scenarios like this , and they have historically done so for many cases, it's fine. It's an acceptable workaround for a hard problem and the cost of doing business ( just like Credit Cards accept a certain amount of loss to fraud as part of business)

cryptonym 10 hours ago | parent | next [-]

They don't have to compute it in real time. They can cut service when they detect it reached the cost and the difference is free of charge.

Overcharge protection doesn't have to be free. It could be +5% on prices or a fee of 25% when you reach the threshold.

They would have financial interest in calculating cost in real time and it'd magically become more and more precise over releases.

wongarsu 11 hours ago | parent | prev | next [-]

Cutting off at the exact cent is difficult, but a hard limit that triggers within one dollar of the actual limit should really be possible

If for some resources you can't sample measurements fast enough you could weaken it to "triggers within one dollar or five minutes after cost overrun, whichever comes later". But LLM APIs are one of those cases where time isn't a factor, your only issue is that if you only check quota before each inference a given query might bring you over

Nathanba 11 hours ago | parent | prev | next [-]

Why would it be hard to calculate cost? Multiply a fixed price * requests/time ? It doesn't have to be exact in real time, it just has to report something approximately useful in realtime.

It's absolutely not fine to be at the mercy of other people, that's what we buy cloud products or really any products for: So that we are not at the mercy of hardware faults, bad weather, bad teeth, hunger, thirst, [insert anything]

mexicocitinluez 9 hours ago | parent [-]

I'm guessing the answer is simply money. It's less expensive to deal with people like this this than it probably was to prevent it. Right now, they seem to run very sparsely, so ramp that up (if it's every 3 hours and they want to change to 5 minutes that's like a 6000% increase) and they're probably paying more than it costs to employ people to return credits or fears of people leaving.

It sucks, but that's unfortunately the world we live in until something changes.

The US could rely on an agency like the CFPB to prevent this, but that was gutted under the current admin.

EdwardDiego 11 hours ago | parent | prev | next [-]

> I think the logistics of calculating cost in real time is something that is extremely hard.

What makes you think that?

zulban 11 hours ago | parent | prev [-]

Ridiculous. They are clearly not trying at all. A hard wall preventing going over budget by 100x in a couple hours is not some devilishly complicated decentralized system problem.

Don't tote the party line.

Same reason why Azure AI only has easy rate limits by minute, not by day or week or month. Open source proxy projects do it easily tho. Think about the incentives.

Going over a hard cap by 3% would be a reasonable failure to make, not by 30000%.