It's not that it's technically impossible. The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met. No cloud provides wants to give their customers that much rope to hang themselves with. You just know too many customers will do it wrong or will forget to update the cap or will not coordinate internally, and things will stop working and take forever to fix.

It's easier to waive cost overages than deal with any of that.

▲

ed_elliott_asc 10 hours ago | parent | next [-]

Let people take the risk - somethings in production are less important than others.

	▲	arjie 6 hours ago \| parent [-]
		They have all the primitives. I think it's just that people are looking for a less raw version than AWS. In fact, perhaps many of these users should be using some platform that is on AWS, or if they're just playing around with an EC2 they're probably better off with Digital Ocean or something. AWS is less like your garage door and more like the components to build an industrial-grade blast-furnace - which has access doors as part of its design. You are expected to put the interlocks in. Without the analogy, the way you do this on AWS is: 1. Set up an SNS queue 2. Set up AWS budget notifications to post to it 3. Set up a lambda that watches the SNS queue And then in the lambda you can write your own logic which is smart: shut down all instances except for RDS, allow current S3 data to remain there but set the public bucket to now be private, and so on. The obvious reason why "stop all spending" is not a good idea is that it would require things like "delete all my S3 data and my RDS snapshots" and so on which perhaps some hobbyist might be happy with but is more likely a footgun for the majority of AWS users. In the alternative world where the customer's post is "I set up the AWS budget with the stop-all-spending option and it deleted all my data!" you can't really give them back the data. But in this world, you can give them back the money. So this is the safer one than that.

▲

callmeal 9 hours ago | parent | prev | next [-]

>The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met.

And why is that a problem? And how different is that from "forgetting" to pay your bill and having your production environment brought down?

	▲	sofixa 2 hours ago \| parent [-]
		> And how different is that from "forgetting" to pay your bill and having your production environment brought down? AWS will remind you for months before they actually stop it.

▲

ndriscoll 9 hours ago | parent | prev | next [-]

Why does this always get asserted? It's trivial to do (reserve the cost when you allocate a resource [0]), and takes 2 minutes of thinking about the problem to see an answer if you're actually trying to find one instead of trying to find why you can't.

Data transfer can be pulled into the same model by having an alternate internet gateway model where you pay for some amount of unmetered bandwidth instead of per byte transfer, as other providers already do.

[0] https://news.ycombinator.com/item?id=45880863

▲

kccqzy 9 hours ago | parent [-]

Reserving the cost until the end of the billing cycle is super unfriendly for spiky traffic and spiky resource usage. And yet one of the main selling points of the cloud is elasticity of resources. If your load is fixed, you wouldn’t even use the cloud after a five minute cost comparison. So your solution doesn’t work for the intended customers of the cloud.

▲

ndriscoll 9 hours ago | parent [-]

It works just fine. No reason you couldn't adjust your billing cap on the fly. I work in a medium size org that's part of a large one, and we have to funnel any significant resource requests (e.g. for more EKS nodes) through our SRE teams anyway to approve.

Actual spikey traffic that you can't plan for or react to is something I've never heard of, and believe is a marketing myth. If you find yourself actually trying to suddenly add a lot of capacity, you also learn that the elasticity itself is a myth; the provisioning attempt will fail. Or e.g. lambda will hit its scaling rate limit way before a single minimally-sized fargate container would cap out.

If you don't mind the risk, you could also just not set a billing limit.

The actual reason to use clouds is for things like security/compliance controls.

▲

kccqzy 8 hours ago | parent [-]

I think I am having some misunderstanding about exactly how this cost control works. Suppose that a company in the transportation industry needs 100 CPUs worth of resources most of the day and 10,000 CPUs worth of resources during morning/evening rush hours. How would your reserved cost proposal work? Would it require having a cost cap sufficient for 10,000 CPUs for the entire day? If not, how?

	▲	ndriscoll 8 hours ago \| parent [-]
		10,000 cores is an insane amount of compute (even 100 cores should already be able to easily deal with millions of events/requests per second), and I have a hard time believing a 100x diurnal difference in needs exists at that level, but yeah, actually I was suggesting that they should have their cap high enough to cover 10,000 cores for the remainder of the billing cycle. If they need that 10,000 for 4 hours a day, that's still only a factor of 6 of extra quota, and the quota itself 1. doesn't cost them anything and 2. is currently infinity. I also expect that in reality, if you regularly try to provision 10,000 cores of capacity at once, you'll likely run into provisioning failures. Trying to cost optimize your business at that level at the risk of not being able to handle your daily needs is insane, and if you needed to take that kind of risk to cut your compute costs by 6x, you should instead go on-prem with full provisioning. Having your servers idle 85% of the day does not matter if it's cheaper and less risky than doing burst provisioning. The only one benefiting from you trying to play utilization optimization tricks is Amazon, who will happily charge you more than those idle servers would've cost and sell the unused time to someone else.

▲

Nevermark 7 hours ago | parent | prev | next [-]

> No cloud provides wants to give their customers that much rope to hang themselves with.

Since there are in fact two ropes, maybe cloud providers should make it easy for customers to avoid the one they most want to avoid?

▲

archerx 10 hours ago | parent | prev | next [-]

Old hosts used to do that. 20 years ago when my podcast started getting popular I was hit with a bandwidth limit exceeded screen/warning. I was broke at the time and could not have afforded the overages (back then the cost per gig was crazy). The podcast not being downloadable for two days wasn’t the end of the world. Thankfully for me the limit was reached at the end of the month.

▲

pyrale 9 hours ago | parent | prev | next [-]

> It's not that it's technically impossible.

It is technically impossible. In that no tech can fix the greed of the people taking these decisions.

> No cloud provides wants to give their customers that much rope to hang themselves with.

They are so benevolent to us...

▲

scotty79 9 hours ago | parent | prev | next [-]

I would love to have an option to automatically bring down the whole production once it's costing more than what it's earning. To think of it. I'd love this to be default.

When my computer runs out of hard drive it crashes, not goes out on the internet and purchases storage with my credit card.

▲

nwellinghoff 7 hours ago | parent | prev | next [-]

Orrr AWS could just buffer it for you. Algo.

1) you hit the cap 2) aws sends alert but your stuff still runs at no cost to you for 24h 3) if no response. Aws shuts it down forcefully. 4) aws eats the “cost” because lets face it. It basically cost them 1000th of what they bill you for. 5) you get this buffer 3 times a year. After that. They still do the 24h forced shutdown but you get billed. Everybody wins.

▲

wat10000 9 hours ago | parent | prev [-]

Millions of businesses operate this way already. There's no way around it if you have physical inventory. And unlike with cloud services, getting more physical inventory after you've run out can take days, and keeping more inventory than you need can get expensive. Yet they manage to survive.

	▲	pixl97 8 hours ago \| parent [-]
		And cloud is really more scary. You have nearly unlimited liability and are at the mercy of the cloud service forgiving your debt if something goes wrong.