| ▲ | Tell HN: I cut Claude API costs from $70/month to pennies | |||||||||||||||||||
| 28 points by ok_orco 8 hours ago | 12 comments | ||||||||||||||||||||
The first time I pulled usage costs after running Chatter.Plus - a tool I'm building that aggregates community feedback from Discord/GitHub/forums - for a day hours, I saw $2.30. Did the math. $70/month. $840/year. For one instance. Felt sick. I'd done napkin math beforehand, so I knew it was probably a bug, but still. Turns out it was only partially a bug. The rest was me needing to rethink how I built this thing. Spent the next couple days ripping it apart. Making tweaks, testing with live data, checking results, trying again. What I found was I was sending API requests too often and not optimizing what I was sending and receiving. Here's what moved the needle, roughly big to small (besides that bug that was costin me a buck a day alone): - Dropped Claude Sonnet entirely - tested both models on the same data, Haiku actually performed better at a third of the cost - Started batching everything - hourly calls were a money fire - Filter before the AI - "lol" and "thanks" are a lot of online chatter. I was paying AI to tell me that's not feedback. That said, I still process agreements like "+1" and "me too." - Shorter outputs - "H/M/L" instead of "high/medium/low", 40-char title recommendation - Strip code snippets before processing - just reiterating the issue and bloating the call End of the week: pennies a day. Same quality. I'm not building a VC-backed app that can run at a loss for years. I'm unemployed, trying to build something that might also pay rent. The math has to work from day one. The upside: these savings let me 3x my pricing tier limits and add intermittent quality checks. Headroom I wouldn't have had otherwise. Happy to answer questions. | ||||||||||||||||||||
| ▲ | deepsummer 5 minutes ago | parent | next [-] | |||||||||||||||||||
As much as I like the Claude models, they are expensive. I wouldn't use them to process large volumes of data. Gemini 2.5 Flash-Lite is $0.10 per million tokens. Grok 4.1 Fast is really good and only $0.20. They will work just as well for most simple tasks. | ||||||||||||||||||||
| ▲ | LTL_FTC 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines. | ||||||||||||||||||||
| ▲ | 44za12 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
This is the way. I actually mapped out the decision tree for this exact process and more here: | ||||||||||||||||||||
| ▲ | gandalfar 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
Consider using z.ai as model provider to further lower your costs. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | joshribakoff an hour ago | parent | prev | next [-] | |||||||||||||||||||
Have you looked into https://maartengr.github.io/BERTopic/index.html ? | ||||||||||||||||||||
| ▲ | dezgeg 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically | ||||||||||||||||||||
| ▲ | arthurcolle 8 hours ago | parent | prev | next [-] | |||||||||||||||||||
Can you discuss a bit more of the architecture? | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | DeathArrow an hour ago | parent | prev [-] | |||||||||||||||||||
You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially. | ||||||||||||||||||||