These sort of things show up about once a day between the three big cloud subreddit. Often with larger amounts

And it’s always the same - clouds refuse to provide anything more than alerts (that are delayed) and your only option is prayer and begging for mercy.

Followed by people claiming with absolute certainty that it’s literally technically impossible to provide hard capped accounts to tinkerers despite there being accounts like that in existence already (some azure accounts are hardcapped by amount but ofc that’s not loudly advertised).

▲

Waterluvian 9 hours ago | parent | next [-]

This might be speaking the obvious, but I think that the lack of half-decent cost controls is not intentionally malicious. There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money. I think it's the play between incompetence and having absolutely no incentive to do anything about it (which is still a form of malice).

I've used AWS for about 10 years and am by no means an expert, but I've seen all kinds of ugly cracks and discontinuities in design and operation among the services. AWS has felt like a handful of very good ideas, designed, built, and maintained by completely separate teams, littered by a whole ton of "I need my promotion to VP" bad ideas that build on top of the good ones in increasingly hacky ways.

And in any sufficiently large tech orgnization, there won't be anyone at a level of power who can rattle cages about a problem like this, who will want to be the one to do actually it. No "VP of Such and Such" will spend their political capital stressing how critical it is that they fix the thing that will make a whole bunch of KPIs go in the wrong direction. They're probably spending it on shipping another hacked-together service with Web2.0-- er. IOT-- er. Blockchai-- er. Crypto-- er. AI before promotion season.

▲

sgarland 9 hours ago | parent | next [-]

> There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money.

I dunno, Aurora’s pricing structure feels an awful lot like that. “What if we made people pay for storage and I/O? And we made estimating I/O practically impossible?”

▲

scotty79 9 hours ago | parent | prev | next [-]

> I think that the lack of half-decent cost controls is not intentionally malicious

It wasn't when the service was first created. What's intentionally malicious is not fixing it for years.

Somehow AI companies got this right form the get go. Money up front, no money, no tokens.

It's easy to guess why. Unlike hosting infra bs, inference is a hard cost for them. If they don't get paid, they lose (more) money. And sending stuff to collections is expensive and bad press.

▲

otterley 8 hours ago | parent [-]

> Somehow AI companies got this right form the get go. Money up front, no money, no tokens.

That’s not a completely accurate characterization of what’s been happening. AI coding agent startups like Cursor and Windsurf started by attracting developers with free or deeply discounted tokens, then adjusted the pricing as they figure out how to be profitable. This happened with Kiro too[1] and is happening now with Google’s Antigravity. There’s been plenty of ink spilled on HN about this practice.

[1] disclaimer: I work for AWS, opinions are my own

	▲	gbear605 8 hours ago \| parent [-]
		I think you’re talking about a different thing? The bad practice from AWS et al is that you post-pay for your usage, so usage can be any amount. With all the AI things I’ve seen, either: - you prepay a fixed amount (“$200/mo for ChatGPT Max”) - you deposit money upfront into a wallet, if the wallet runs out of cash then you can’t generate any more tokens - it’s free! I haven’t seen any of the major model providers have a system where you use as many tokens as you want and then they bill you, like AWS has.

▲

9 hours ago | parent | prev | next [-]

[deleted]

▲

duped 8 hours ago | parent | prev | next [-]

> There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money.

It's someone in a Patagonia vest trying to avoid getting PIP'd.

▲

lysace 9 hours ago | parent | prev | next [-]

All of that is by design, in a bad way.

▲

colechristensen 9 hours ago | parent | prev [-]

AWS isn't for tinkerers and doesn't have guard rails for them, that's it. Anybody can use it but it's not designed for you to spend $12 per month. They DO have cost anomaly monitoring, they give you data so you can set up your own alerts for usage or data, but it's not a primary feature because they're picking their customers and it isn't the bottom of the market hobbyist. There are plenty of other services looking for that segment.

I have budgets set up and alerts through a separate alerting service that pings me if my estimates go above what I've set for a month. But it wouldn't fix a short term mistake; I don't need it to.

▲

cristiangraz 7 hours ago | parent | prev | next [-]

AWS just released flat-rate pricing plans with no overages yesterday. You opt into a $0, $15, or $200/mo plan and at the end of the month your bill is still $0, $15, or $200.

It solves the problem of unexpected requests or data transfer increasing your bill across several services.

https://aws.amazon.com/blogs/networking-and-content-delivery...

▲

ipsento606 5 hours ago | parent | next [-]

https://aws.amazon.com/cloudfront/pricing/ says that the $15-per-month plan comes with 50TB of "data transfer"

Does "data transfer" not mean CDN bandwidth here? Otherwise, that price seems two orders of magnitude less than I would expect

▲

throwaway-aws9 4 hours ago | parent | next [-]

With AWS, there's always a catch. In this case, it's for 10M requests. In other words, you pay $15 for 10M requests of up to 5MB each.

[edit: looks like there's no overages but they may force you to flip to the next tier and seems like they will throttle you https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope....]

▲

weberer 4 hours ago | parent | prev [-]

The $15 plan notably does not come with DDoS protection though.

	▲	ipsento606 4 hours ago \| parent [-]
		the pricing page says it comes with "Always-on DDoS Protection" but not "Advanced DDoS Protection" I have no idea what these terms mean in practice

▲

Havoc 5 hours ago | parent | prev [-]

That actually looks really good thanks for highlighting this

▲

moduspol 9 hours ago | parent | prev | next [-]

AWS would much rather let you accidentally overspend and then forgive it when you complain than see stories about critical infrastructure getting shut off or failing in unexpected ways due to a miscommunication in billing.

▲

DenisM 5 hours ago | parent [-]

They could have given us a choice though. Sign in blood that you want to be shut off in case of over spend.

▲

simsla an hour ago | parent | next [-]

You could set a cloudwatch cost alert that scuttles your IAM and effectively pulls the plug on your stack. Or something like that.

▲

moduspol 3 hours ago | parent | prev [-]

As long as "shut off" potentially includes irrecoverable data loss, I guess, as it otherwise couldn't conclusively work. Along with a bunch of warnings to prevent someone accidentally (or maliciously) enabling it on an important account.

Still sounds kind of ugly.

	▲	DenisM 2 hours ago \| parent [-]
		Malicious or erroneous actor can also drop your s3 buckets. Account change has stricter permissions. The key problem is that data loss is really bad pr which cannot be reversed. Overcharge can be reversed. In a twisted way it might even strengthen the public image, I have seen that happen elsewhere.

▲

nijave 6 hours ago | parent | prev | next [-]

I've always been under the impression billing is async and you really need it to be synchronous unless cost caps work as a soft limit.

You can transfer from S3 on a single instance usually as fast as the instances NIC--100Gbps+

You'd need a synchronous system that checks quotas before each request and for a lot of systems you'd also need request cancellation (imagine transferring a 5TiB file from S3 and your cap triggers at 100GiB--the server needs to be able to receive a billing violation alert in real time and cancel the request)

I imagine anything capped provided to customers already AWS just estimates and eats the loss

Obviously such a system is possible since IAM/STS mostly do this but I suspect it's a tradeoff providers are reluctant to make

▲

cobolcomesback 7 hours ago | parent | prev | next [-]

AWS just yesterday launched flat rate pricing for their CDN (including a flat rate allowance for bandwidth and S3 storage), including a guaranteed $0 tier.

https://news.ycombinator.com/item?id=45975411

I agree that it’s likely very technically difficult to find the right balance between capping costs and not breaking things, but this shows that it’s definitely possible, and hopefully this signals that AWS is interested in doing this in other services too.

▲

strogonoff 9 hours ago | parent | prev | next [-]

I think it’s disingenuous to claim that AWS only offers delayed alerts and half-decent cost controls. Granted, these features were not there in the beginning, but for years now AWS, in addition to the better known stuff like strategic limits on auto scaling, allows subscribing to price threshold triggers via SNS and perform automatic actions, which could be anything including scaling down or stopping services completely if the cost skyrockets.

	▲	9 hours ago \| parent [-]
		[deleted]

▲

jrjeksjd8d 9 hours ago | parent | prev | next [-]

The problem with hard caps is that there's no way to retroactively fix "our site went down". As much as engineers are loathe to actually reach out to a cloud provider, are there any anecdotes of AWS playing hardball and collecting a 10k debt for network traffic?

Conversely the first time someone hits an edge case in billing limits and their site goes down, losing 10k worth of possible customer transactions there's no way to unring that bell.

The second constituency are also, you know, the customers with real cloud budgets. I don't blame AWS for not building a feature that could (a) negatively impact real, paying customers (b) is primarily targeted at people who by definition don't want to pay a lot of money.

	▲	Havoc 6 hours ago \| parent \| next [-]
		Keeping the site up makes sense as a default. Thats what their real business customers needs so that has priority. But an opt in „id rather you deleting data/disable than send me a 100k bill“ toggle with suitable disclaimers would mean people can safely learn. Thats way everyone gets what they want. (Well except cloud provider who presumably don’t like limits on their open ended bills)
	▲	withinboredom 9 hours ago \| parent \| prev \| next [-]
		Since you would have to have set it up, I fail to see how this is a problem.
	▲	scotty79 9 hours ago \| parent \| prev [-]
		I'd much rather lose 10k in customers that might potentially come another day than 10k in Amazon bill. Amazon bill feels like more unringable. But hey, let's say you have different priorities than me. Then why not bot? Why not let me set the hard cap? Why Amazon insists on being able to bill me on more than my business is worth if I make a mistake?

▲

sofixa 12 hours ago | parent | prev | next [-]

It's not that it's technically impossible. The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met. No cloud provides wants to give their customers that much rope to hang themselves with. You just know too many customers will do it wrong or will forget to update the cap or will not coordinate internally, and things will stop working and take forever to fix.

It's easier to waive cost overages than deal with any of that.

▲

ed_elliott_asc 10 hours ago | parent | next [-]

Let people take the risk - somethings in production are less important than others.

	▲	arjie 6 hours ago \| parent [-]
		They have all the primitives. I think it's just that people are looking for a less raw version than AWS. In fact, perhaps many of these users should be using some platform that is on AWS, or if they're just playing around with an EC2 they're probably better off with Digital Ocean or something. AWS is less like your garage door and more like the components to build an industrial-grade blast-furnace - which has access doors as part of its design. You are expected to put the interlocks in. Without the analogy, the way you do this on AWS is: 1. Set up an SNS queue 2. Set up AWS budget notifications to post to it 3. Set up a lambda that watches the SNS queue And then in the lambda you can write your own logic which is smart: shut down all instances except for RDS, allow current S3 data to remain there but set the public bucket to now be private, and so on. The obvious reason why "stop all spending" is not a good idea is that it would require things like "delete all my S3 data and my RDS snapshots" and so on which perhaps some hobbyist might be happy with but is more likely a footgun for the majority of AWS users. In the alternative world where the customer's post is "I set up the AWS budget with the stop-all-spending option and it deleted all my data!" you can't really give them back the data. But in this world, you can give them back the money. So this is the safer one than that.

▲

callmeal 9 hours ago | parent | prev | next [-]

>The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met.

And why is that a problem? And how different is that from "forgetting" to pay your bill and having your production environment brought down?

	▲	sofixa 2 hours ago \| parent [-]
		> And how different is that from "forgetting" to pay your bill and having your production environment brought down? AWS will remind you for months before they actually stop it.

▲

ndriscoll 9 hours ago | parent | prev | next [-]

Why does this always get asserted? It's trivial to do (reserve the cost when you allocate a resource [0]), and takes 2 minutes of thinking about the problem to see an answer if you're actually trying to find one instead of trying to find why you can't.

Data transfer can be pulled into the same model by having an alternate internet gateway model where you pay for some amount of unmetered bandwidth instead of per byte transfer, as other providers already do.

[0] https://news.ycombinator.com/item?id=45880863

▲

kccqzy 9 hours ago | parent [-]

Reserving the cost until the end of the billing cycle is super unfriendly for spiky traffic and spiky resource usage. And yet one of the main selling points of the cloud is elasticity of resources. If your load is fixed, you wouldn’t even use the cloud after a five minute cost comparison. So your solution doesn’t work for the intended customers of the cloud.

▲

ndriscoll 9 hours ago | parent [-]

It works just fine. No reason you couldn't adjust your billing cap on the fly. I work in a medium size org that's part of a large one, and we have to funnel any significant resource requests (e.g. for more EKS nodes) through our SRE teams anyway to approve.

Actual spikey traffic that you can't plan for or react to is something I've never heard of, and believe is a marketing myth. If you find yourself actually trying to suddenly add a lot of capacity, you also learn that the elasticity itself is a myth; the provisioning attempt will fail. Or e.g. lambda will hit its scaling rate limit way before a single minimally-sized fargate container would cap out.

If you don't mind the risk, you could also just not set a billing limit.

The actual reason to use clouds is for things like security/compliance controls.

▲

kccqzy 8 hours ago | parent [-]

I think I am having some misunderstanding about exactly how this cost control works. Suppose that a company in the transportation industry needs 100 CPUs worth of resources most of the day and 10,000 CPUs worth of resources during morning/evening rush hours. How would your reserved cost proposal work? Would it require having a cost cap sufficient for 10,000 CPUs for the entire day? If not, how?

	▲	ndriscoll 7 hours ago \| parent [-]
		10,000 cores is an insane amount of compute (even 100 cores should already be able to easily deal with millions of events/requests per second), and I have a hard time believing a 100x diurnal difference in needs exists at that level, but yeah, actually I was suggesting that they should have their cap high enough to cover 10,000 cores for the remainder of the billing cycle. If they need that 10,000 for 4 hours a day, that's still only a factor of 6 of extra quota, and the quota itself 1. doesn't cost them anything and 2. is currently infinity. I also expect that in reality, if you regularly try to provision 10,000 cores of capacity at once, you'll likely run into provisioning failures. Trying to cost optimize your business at that level at the risk of not being able to handle your daily needs is insane, and if you needed to take that kind of risk to cut your compute costs by 6x, you should instead go on-prem with full provisioning. Having your servers idle 85% of the day does not matter if it's cheaper and less risky than doing burst provisioning. The only one benefiting from you trying to play utilization optimization tricks is Amazon, who will happily charge you more than those idle servers would've cost and sell the unused time to someone else.

▲

Nevermark 7 hours ago | parent | prev | next [-]

> No cloud provides wants to give their customers that much rope to hang themselves with.

Since there are in fact two ropes, maybe cloud providers should make it easy for customers to avoid the one they most want to avoid?

▲

archerx 9 hours ago | parent | prev | next [-]

Old hosts used to do that. 20 years ago when my podcast started getting popular I was hit with a bandwidth limit exceeded screen/warning. I was broke at the time and could not have afforded the overages (back then the cost per gig was crazy). The podcast not being downloadable for two days wasn’t the end of the world. Thankfully for me the limit was reached at the end of the month.

▲

pyrale 9 hours ago | parent | prev | next [-]

> It's not that it's technically impossible.

It is technically impossible. In that no tech can fix the greed of the people taking these decisions.

> No cloud provides wants to give their customers that much rope to hang themselves with.

They are so benevolent to us...

▲

scotty79 9 hours ago | parent | prev | next [-]

I would love to have an option to automatically bring down the whole production once it's costing more than what it's earning. To think of it. I'd love this to be default.

When my computer runs out of hard drive it crashes, not goes out on the internet and purchases storage with my credit card.

▲

nwellinghoff 7 hours ago | parent | prev | next [-]

Orrr AWS could just buffer it for you. Algo.

1) you hit the cap 2) aws sends alert but your stuff still runs at no cost to you for 24h 3) if no response. Aws shuts it down forcefully. 4) aws eats the “cost” because lets face it. It basically cost them 1000th of what they bill you for. 5) you get this buffer 3 times a year. After that. They still do the 24h forced shutdown but you get billed. Everybody wins.

▲

wat10000 8 hours ago | parent | prev [-]

Millions of businesses operate this way already. There's no way around it if you have physical inventory. And unlike with cloud services, getting more physical inventory after you've run out can take days, and keeping more inventory than you need can get expensive. Yet they manage to survive.

	▲	pixl97 8 hours ago \| parent [-]
		And cloud is really more scary. You have nearly unlimited liability and are at the mercy of the cloud service forgiving your debt if something goes wrong.

▲

belter 9 hours ago | parent | prev [-]

These topics are not advanced...they are foundational scenarios covered in any entry level AWS or AWS Cloud third-party training.

But over the last few years, people have convinced themselves that the cost of ignorance is low. Companies hand out unlimited self-paced learning portals, tick the “training provided” box, and quietly stop validating whether anyone actually learned anything.

I remember when you had to spend weeks in structured training before you were allowed to touch real systems. But starting around five or six years ago, something changed: Practitioners began deciding for themselves what they felt like learning. They dismantled standard instruction paths and, in doing so, never discovered their own unknown unknowns.

In the end, it created a generation of supposedly “trained” professionals who skipped the fundamentals and now can’t understand why their skills have giant gaps.

	▲	shermantanktop 7 hours ago \| parent [-]
		If I accept your premise (which I think is overstated) I’d say it’s a good thing. We used to ship software with literally 100lbs of manual and sell expensive training, and then consulting when they messed up. Tons of perverse incentives. The expectation that it just works is mostly a good thing.