| ▲ | Ask HN: How are you keeping AI coding agents from burning money? | |||||||||||||||||||||||||
| 4 points by bhaviav100 18 hours ago | 19 comments | ||||||||||||||||||||||||||
My agents retry a bit more than it should, and there goes my bill up in the sky. I tried figuring out what is causing this but none of the tools helped much. and the worse thing for me is that everything shows up as aggregate usage. Total tokens, total cost, maybe per model. So I ended up hacking together a thin layer in front of OpenAI where every request is forced to carry some context (agent, task, user, team), and then just logging and calculating cost per call and putting some basic limits on top so you can actually block something if it starts going off the rails. It’s very barebones, but even just seeing “this agent + this task = this cost” was a big relief. It uses your own OpenAI key, so it’s not doing anything magical on the execution side, just observing and enforcing. I want to know you guys are dealing with this right now. Are you just watching aggregate usage and trusting it, or have you built something to break it down per agent / task? If useful, here is the rough version I’m using : https://authority.bhaviavelayudhan.com/ | ||||||||||||||||||||||||||
| ▲ | paulwelty an hour ago | parent | next [-] | |||||||||||||||||||||||||
Mine have burned a lot of money! Right now, I'm trying to keep the context smaller. It takes a lot of discipline, though, to have a system that gives enough context to do the work but not so much the agent can go off doing new/crazy stuff. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | bisonbear 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
cost control is a policy problem - we certainly don't need to use opus 4.6 for a simple test refactor, but many people (including myself) default to it anyways. we need a way to measure cost / performance for agents on individual repos, with individual types of tasks, to get a better sense of what tasks can be trusted to cheaper agents, and what tasks must be routed to the SOTA | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | DarthCeltic85 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
I had gotten a student/ultra code for antigravity promo for three months, so I was using that, but that finally ran out this month. Currently Im using windstream and flipping between claude as my left brain and code extraction and the higher context but cheaperish models there. honestly though, im getting to a point where im running custom project mds that flip between different models for different things, using list outputs depending on what it finds and runs. (I have two monorepo projects, and one thats a polyglot microengine that jumps using gRPC communication.) The mds are highly specialized for each project as each project deals with vastly different issues. Cycling through the different pro accounts and keeping the mds in place over it all is helping me not kill my wallet. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | grahammccain 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Kinda of an adjacent question but do you think the token/usage way of paying for things will stick? I still think people would rather pay a monthly subscription for a seat. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | jerome_mc 13 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
AI outputs often feel like a gacha game. Paradoxically, the 'expensive' tokens are sometimes the cheapest in the long run. In my experience, higher-end models have a much higher 'one-shot' success rate. You aren't just saving on total token count by avoiding loops; you’re saving engineering time, which is always the most expensive resource anyway. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | rox_kd 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
In what settings do you mean - there are multiple strategies, I think building your own compaction layer in front seems a bit over-kill ? have you considered implementing some cache strategy, otherwise summary pipelines - I made once an agent which based on the messages routed things to a smaller model for compaction / summaries to bring down the context, for the main agent. But also ensuring you start new fresh context threads, instead of banging through a single one untill your whole feature is done .. working in small atomic incrementals works pretty good | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | Ryand123 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
yeah i just watch aggregate usage and honestly i hate it. But it works since it's for personal projects so I can control api keys however I want. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | spl757 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Don't use tech with deep, unresolved flaws and you won't get fucked. Would you find it acceptable if Postgresql occassionally hallucinated and returned gibberish? Fuck no. Wny is this okay with ANY software? Answer, it's not. AI IS NOT READY. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | spl757 12 hours ago | parent | prev [-] | |||||||||||||||||||||||||
By not using it. The tech is flawed. It hallucinates. It's not production ready. I've said it before, and I will say it again. Anyone using AI in a production environment is a fucking idiot. | ||||||||||||||||||||||||||