Remix.run Logo
SimianSci 8 hours ago

The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier. The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.

its alot of money to be spending for a single 9 of reliablility.

Shakahs 6 hours ago | parent | next [-]

If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.

JamesSwift 6 hours ago | parent | next [-]

How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?

Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.

GardenLetter27 5 hours ago | parent | next [-]

AWS Bedrock supports prompt caching, just note that if you use the Converse API you need to set the cache points manually.

thepasch 6 hours ago | parent | prev | next [-]

> Would I theoretically have a more stable harness backing my usage?

If you don’t mind an opinionated harness that asks for a pretty specific workflow, but one that works well, use OpenCode.

If you want to spread your wings and feel the sweet kiss of freedom, use Pi.

JamesSwift 6 hours ago | parent | next [-]

Im looking at moving to Pi and I like the minimal nature, but I disagree with a handful of decisions they make. So Id likely need to maintain a fork which is less than ideal.

carterschonwald 2 hours ago | parent | next [-]

check out my pi forks.

JamesSwift 2 hours ago | parent [-]

Ummmmmm, how?

wyre 5 hours ago | parent | prev [-]

What decisions is Mario making that you disagree with? My impression is Pi is minimal so any changes can live on top of Pi without needing to maintain a fork?

I started developing my own coding agent after using Pi for a couple months, so I’m curious what you don’t like about pi.

JamesSwift 5 hours ago | parent [-]

When I hear Mario talk about pi and his approach I find myself agreeing with a lot of it. But I also find myself agreeing with a lot of the points from this https://www.thevinter.com/blog/bad-vibes-from-pi

mattmanser 4 hours ago | parent [-]

the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting. Bold choices.

To save others a click, though the article is worth reading.

He also mentions no subagents by default in pi as well.

canadiantim 15 minutes ago | parent [-]

oh-my-pi harness fixes many of these, like subagents

zackify 6 hours ago | parent | prev [-]

pi for the win, i have my own ai extend it when i want more specific features. vibe coded in 20 minutes shift+tab like claude code to add permission control.

unshavedyak 15 minutes ago | parent [-]

I find it so funny that many of these harnesses sound like black magic and are completely mystical to me. I use Claude Code every day and yet i can't imagine the workflow of Pi. I also don't care to pay API rates just to experiment with them.

Largely though i'm happy with Claude Code w\ IDE integration, so i don't feel the need to migrate. Nonetheless i'm curious.

theplatman 5 hours ago | parent | prev [-]

you can use claude code with these other providers

Hamuko 6 hours ago | parent | prev [-]

The enterprise tier is API pricing only.

https://support.claude.com/en/articles/9797531-what-is-the-e...

Shakahs 3 hours ago | parent | next [-]

Enterprise adds IAM, logging, and analytics, all of which AWS provides for free or for metered usage without needing an enterprise plan.

ranman 6 hours ago | parent | prev [-]

They'll cut you a private offer for bedrock tokens but bedrock has a 32k output limit

robkop 5 hours ago | parent | next [-]

I use bedrock with 1M context every day. Not sure this is right

conception 5 hours ago | parent [-]

4.7 is the first opus model that’s had the 1 M context window available on Bedrock.

mastercheif 4 hours ago | parent [-]

Not true. Opus and Sonnet 4.6 support 1m context on Bedrock.

8note 3 hours ago | parent | prev [-]

isnt that an input limit from api gateway?

Someone1234 8 hours ago | parent | prev | next [-]

Obviously there is only so much you can say; but is that $200K due to the raw number of seats you have, or are you burning through a lot on raw API usage? I guess I'm trying to understand, large business, or large usage.

SimianSci 7 hours ago | parent [-]

we are in the SMB space, the spend is almost entirely usage for us at this point, rather than seat cost. For context, we are a software firm focused on difficult engineering problems, but I cant divulge much else.

2ndorderthought 6 hours ago | parent [-]

Have you guys considered running your own local models? 200k a month is a ton of money and puts all your eggs in one basket. Or is it easier to just be able to run away from it all if you are done with it or something changes?

SimianSci 6 hours ago | parent | next [-]

I led the team that did the math and analysis for determining our direction in selecting Anthropic. We initially assumed this was where we would end up, but after some investment exploring our options we found it not worth the trouble.

Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security. It was possible for us to invest in this, but it would require additional investment in hiring or training to get us to a state comparable to the hosted options.

Eventually, I had to recommend against the project as it was more likely to be an investment in the leading team's resume, than an actual investment into our organization.

nyrikki 5 hours ago | parent | next [-]

To start, I want to be clear I am trying to understand not criticizing, and mistakes are how institutional knowledge grows.

Your last paragraph hints at retention struggles which complicates the issue.

But was vendor mitigation not part of the evaluation? I get that most companies view governance and compliance as a pay to play issue, but there has always been an issue with rapidly changing areas and single source suppliers.

I admit to having my own preferences and being almost completely ignorant about what your needs are, but I have seen the value in having a rabbit to pull out of the hat.

If employee retention doesn’t allow for departure of individuals without complete loss of institutional knowledge I guess my position wouldn’t hold.

But during the rise of cloud computing I introduced an openstack install in our sandbox, not because I thought that we would stay on a private cloud but because it allowed our team to pull back the covers and understand what our cloud vendor was doing.

It was an adoption accelerator that enabled us to choose a vendor that was appropriate and to avoid the long tail of implementation.

I was valuable as a pivot when AMD killed seamicro with short notice, and the full cloud migration period was dramatically shortened.

I have a dozen other examples, but it is like stock options, volatility and uncertainty dramatically increase the value of keeping your options open.

We will have vendors fold, and a single source only story couples you org to the success of that vendor.

IMHO There is a huge difference between tying your success to an Oracle, who may be ‘safe’ if expensive as a captive customer and doing the same in uncertain markets.

Would you be willing (or able) to share more?

willsmith72 2 hours ago | parent [-]

it's an SMB, if you need redundancy on every 3rd party dependency your business will die anyway

better to take the risk for most things. if the worst case happens and you have to migrate, you migrate. otherwise you risk overengineering upfront and guaranteeing reduced productivity rather than risking it

joefourier 5 hours ago | parent | prev | next [-]

> Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.

That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.

I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.

2ndorderthought 5 hours ago | parent | prev [-]

I don't blame you. I personally would consider revisiting it in the next month or so. A lot of people are saying some of these smaller models like qwen 3.6 are basically at Claude sonnet performance if not better.

That level of hardware, if the performance was enough is a much smaller investment and gamble.

Either way I understand the decision. Your product isn't in locally hosted LLMs, why fuss. That said I see 1 million plus in external spend I start wondering about the options. Not saying you did the wrong thing, I think you did the right thing but things seem to be changing on the local model front and quite rapidly.

throwaway314155 4 hours ago | parent | prev [-]

Local models perform objectively worse than SotA SaaS models. Your employees will hate this decision.

2ndorderthought 4 hours ago | parent | next [-]

Some of the local models are effectively there. It depends on what scale you need or want. Kimi 2.6 is up there with opus, granted it's huge. On some benches it's actually better. Qwen3.6 is up there with sonnet but it's nearly microscopic. A lot has changed in the last month

slopinthebag an hour ago | parent | prev [-]

Only if you're vibe coding, with ambiguous prompts that require the model to fill in a huge number of gaps and basically write the software for you.

The people who don't really know what they're doing (or don't care) need the full power of the SOTA models, those with experience can provide enough context and instruction to make even small local models work.

2ndorderthought an hour ago | parent [-]

Some of the latest batch are more vibe code friendly even. It's pretty crazy. People are few shotting small toy games and stuff with qwen3.6. I'm personally not into that work flow but yea. It won't be long until the efficiency wave hits and small models are really all people need

noosphr 8 hours ago | parent | prev | next [-]

A single nine so far. If github is any guide thing will get worse.

smt88 8 hours ago | parent [-]

Why would Github be a guide? It's also terrible, but it's a radically different stack from an unrelated company

StableAlkyne 7 hours ago | parent | next [-]

That, and even before AI, MS was having trouble with GH reliability

shimman 7 hours ago | parent | prev [-]

GitHub, along with MSFT in general, have massive copilot mandates where workers are being shamed into using slop tools to fix serious on-going issues. GitHub seems wholly incapable of resolving their issues: money isn't a problem, talent isn't a problem, but business leadership is definitely a major problem.

Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.

Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.

kentonv 7 hours ago | parent | next [-]

None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.

We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.

(We also fixed a number of problems around configuration that would roll out globally too fast, leaving no time to notice errors and stop a bad rollout, as well as cases where services being down actually made it hard to revert the change... should be in a much better place now. But again, none of that had to do with LLMs.)

hombre_fatal 36 minutes ago | parent | next [-]

Something unexpected that LLMs robbed from us is to receive the grace of assuming we failed on our own e.g. good ol' fashioned human/organizational failure.

a512041364cd 5 hours ago | parent | prev [-]

> None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.

Is that true? At least one of them seemed to involve LLM-written code from what I saw. (Not to say that human error wasn't _also_ a contributing factor, but I wouldn't say it had _nothing_ to do with LLMs).

> We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.

The reviewer is decent, but the false positive rate is substantial, and the false negative rate is definitely nonzero. Not that you would know that the way our genius CTO talks about it...

7 hours ago | parent | prev [-]
[deleted]
wg0 7 hours ago | parent | prev | next [-]

Speaking of developer tooling spend - IDEs are far harder to build such as JetBrain etc and don't think any IDE would be charging this amount to any customer per month.

Not sure how much of a productivity gain a 2.5 million per year it is?

theptip 7 hours ago | parent | next [-]

Supply and demand - if you think it’s not worth the price, take your dollars elsewhere.

This is the brutal reality; even with the crazy reliability issues, demand is still far outstripping supply at the current price.

wg0 6 hours ago | parent [-]

Run Facebook on a single Proxmox box and demand would still outstrip the supply.

What yet needs to be seen is if that demand sustains in the long run at that price point or flattens out proving to be super elastic given that there are many other providers that are catching up pretty fast.

esafak 6 hours ago | parent | prev [-]

IDEs don't need expensive GPUs to create and serve.

nubinetwork 6 hours ago | parent | prev | next [-]

> single 9 of reliability

Out of curiosity, do you actually use it 24/7? The world doesn't collapse every time o365 goes down... (which is also pretty often)

manacit 6 hours ago | parent | next [-]

In my experience the downtime tends to coincide with peak PT timezones. If you're in PT, it's very inconvienent.

Hamuko 6 hours ago | parent [-]

Yeah, I feel like all of the bad downtimes happen during American business hours. We use GitHub at work in Europe and I don't remember it ever being down or broken between 0700 and 1700 local time.

anonyfox 5 hours ago | parent [-]

That’s statistically just luck then - plenty of outages this year already in Berlin time during work hours - I do remember the forced breaks with colleagues for sure.

mgh95 6 hours ago | parent | prev [-]

if it's judged only by the time it is expected to be in use (work hours), reliability is likely even worse than the 24/7 measure.

deadbabe 8 hours ago | parent | prev | next [-]

We are spending the equivalent of 32 monthly software engineer salaries on Claude per month.

jonny_eh 6 hours ago | parent | next [-]

Info like this is useless without context like, how much revenue does the company earn? How many engineers do they employ? etc.

SimianSci 7 hours ago | parent | prev | next [-]

Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams. The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.

protonbob 7 hours ago | parent [-]

I guess if you think about your teammates as purely inputs and outputs and not people that can improve and contribute in the workplace in other ways.

midasz 6 hours ago | parent | next [-]

It's genuinely hilarious how the same leadership pushing for RTO because getting people together creates magic, seems to have no issues trading those same people out for LLM's churning at specs.

maxrev17 6 hours ago | parent [-]

Haha nail on head so the motive for ‘get your ass back in the office’ was never the motive we all heard

SimianSci 7 hours ago | parent | prev [-]

Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.

You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them. Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.

oarsinsync 6 hours ago | parent | next [-]

> Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.

I'd have to disagree. There's a narrow band in the middle where that's true, but once you exceed that, your personal inputs and outputs matter less and less, and the contributions you make to the overall workplace, and how well you enable those around you, make a larger part of why you're compensated.

Even as an IC, the more you're able to mentor and elevate the people around you, the more your compensation will grow (if you're in the right place, and thus already at the right earnings bracket)

paganel 6 hours ago | parent | prev [-]

> you are indeed judged purely off of input and output

That's not how successful (software, in this case) teams are made.

SimianSci 5 hours ago | parent [-]

I would agree if the team im on were still growing/scaling. However we are well past our scaling phase, and at this point our concern is maintaining multi-million dollar contracts with a tight well-compensated team.

cactusplant7374 8 hours ago | parent | prev [-]

Is it worth it?

lolive 7 hours ago | parent | next [-]

He was fired before answering.

[but as his manager I can tell you:] YES !!!!

deadbabe 5 hours ago | parent | prev [-]

No, we can literally buy our own hardware for what we spend in a month and host our own local LLMs for company usage.

nomel 4 hours ago | parent [-]

> and host our own local LLMs for company usage.

What local alternative could replace your Anthropic use? I have found none. I don't think many have, which is why most of us pay Anthropic, rather than using one of the numerous, far cheaper, cloud services that host "local" class models.

Most of us are paying for access to proprietary SOTA models, rather than hosting.

an hour ago | parent [-]
[deleted]
walrus01 8 hours ago | parent | prev | next [-]

Five nines? No, nine fives

bayarearefugee 8 hours ago | parent | prev | next [-]

> has our executive team furious

And yet they will continue to spend wheelbarrows full of money with Anthropic because they want so badly to reach the point where they can fire you.

SimianSci 7 hours ago | parent [-]

I think there is alot of baseless fury behind your words, but my regular interactions with my leadership dont lead me to think they have the end goal of replacing labor. We're blessed to have leadership with technical backgrounds, so the tools are regarded more as significant intelligence enhancers of already exceptionally smart engineers, rather than replacements.

Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.

protonbob 7 hours ago | parent | next [-]

Not ever hiring juniors and eventually mids is just replacing labor with extra steps.

SimianSci 7 hours ago | parent | next [-]

Throwing bodies at a problem doesn't always scale. There are many difficult problems that do not get easier by throwing more juniors or mid level engineers at them.

subscribed 7 hours ago | parent | prev [-]

I think the message you responded to already refuted your point of view.

sillysaurusx 6 hours ago | parent | prev | next [-]

Huh? Your other comment explicitly said you were replacing labor: https://news.ycombinator.com/item?id=47939146

> the increases that we have seen suggest a better ROI than if we had hired 12 developers.

You can’t argue “we were able to get away with not hiring more developers” and also say you aren’t replacing labor.

Morally I trend towards your side of things, but it’s also important to be realistic about what you’re actually doing. Money is going towards Anthropic and not towards new hires. That’s a replacement of labor. It doesn’t matter what the end goal was.

keybored 3 hours ago | parent | prev | next [-]

> I think there is alot of baseless fury behind your words,

Hardly baseless when people have been gloating about how programming as a job is ending any day now for the last year at least.

> Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.

You didn’t mention the size of the company so yeah.

therobots927 6 hours ago | parent | prev [-]

“Baseless fury”

I’m glad your leadership isn’t trying to fire everyone. But in case you live under a rock, tech layoffs are at all time highs. Companies are rewarded by the public markets for laying off workers.

Simultaneously we have AI industry leaders warning of an employment apocalypse once AGI is achieved.

And you think it’s baseless. Have some class bro.

boc 7 hours ago | parent | prev | next [-]

Seems to be back now (claude code at least)

Scarbutt 5 hours ago | parent | prev | next [-]

Is the $200k just development or are the products being developed require AI?

mihaaly 6 hours ago | parent | prev | next [-]

I wonder if self-hosted models would be a sensible step for your organization.

SilverElfin 6 hours ago | parent | prev | next [-]

They must have hired absolutely incompetent leaders on the core software and infrastructure side. Sure their AI research is great but it’s amateur hour. Or just vibe coded slop top to bottom. It seems like every single day people are talking about outages or billing issues or secret changes to how Claude works.

33MHz-i486 6 hours ago | parent [-]

theyre getting high on their own supply, and instead really need to hire some senior engineers

cactusplant7374 8 hours ago | parent | prev | next [-]

Imagine how much money they would save if they switched to Codex.

subscribed 7 hours ago | parent [-]

Not everyone can (due to the corporate compliance requirements, eg the ease of making the LLM not to train on anything).

Besides, codex wasn't always the answer.

simianparrot 7 hours ago | parent | prev [-]

Just give them more money, surely it'll get better.

/s