Remix.run Logo
FuriouslyAdrift 2 hours ago

I work for a tiny little company ($150MM annual rev with 9% net) and we are already looking at dropping $100k on hardware to run local models because, for us, they're "good enough."

Our estimated spend for AIaaS would exceed that cost in less than a year.

In a few years, there will be hardware capable of running frontier models good enough for most things at accessible prices for even tiny companies.

simplyluke 2 hours ago | parent | next [-]

Yeah, that's the part that just seems to be wildly under-discussed to me.

If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?

AI cost ballooning faster than companies can afford is becoming a very common topic in my circles right now. The era of "I'll pay infinitely more for marginal gains" is over from what I can tell.

swalsh 13 minutes ago | parent | next [-]

Open source models, especially qwen are pretty dang good. But its not opus 4.6, the evals dont tell the full story. I question the assumption open source models are 3-6 months out.

doug_durham an hour ago | parent | prev | next [-]

Open source models that you can run locally are much more than 3 to 6 months behind. 6 months was the November inflection for Claude. No open source model is as good as Claude Opus 4.6.

jobs_throwaway an hour ago | parent | next [-]

It depends what you mean by locally. I don't foresee running a model on my laptop anytime soon to power a coding agent. Far more likely is an infra team at my company operating an open source model on cloud infrastructure. When they're already paying $1000 / month / dev, it starts to pencil pretty quickly.

simplyluke an hour ago | parent | prev | next [-]

> that you can run locally

That's doing a lot of work here.

The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.

apocalyptic0n3 an hour ago | parent [-]

> it's them adding a line item to their AWS bill

That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.

PeterStuer an hour ago | parent | prev | next [-]

Many business tasks do not need the latest frontier models. I have a production system running since early GPT-4o. It now runs with GPT-5.2, not for improvements, but because it is cheaper. I could invest in switching to a local model, I tried and it works well enough, but api costs for this task are so low, it barely scratches $30/month. So I am using the local machine for other things and leave the inference on OpenAI, for now.

applfanboysbgon 28 minutes ago | parent | prev | next [-]

Opus 4.6 is a February model. Every time this subject comes up it seems like people post intentionally misleading things and move the goalposts.

The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.

> run locally

You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.

[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.

simonw 23 minutes ago | parent [-]

> You can't run DeepSeek locally on consumer hardware

Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.

applfanboysbgon 3 minutes ago | parent [-]

Oh sure, yeah, that's nothing to sneeze at either. I think unqualified "DeepSeek" should generally refer to the main model, though, especially in the context of GPT5.2-grade quality.

PunchyHamster an hour ago | parent | prev [-]

But one will be in few months. And then you have choice of paying say $100k for hardware and pay just power cost (or pay someone to do that for you), or pay way, way more for your team to have access to marginal improvement.

And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people

w29UiIm2Xz an hour ago | parent | prev | next [-]

If only the AI era was born in ZIRP.

sailfast 6 minutes ago | parent [-]

Better now than ZIRP for me - at least people are asking timid questions about the unit economics and how long the runway is _early_ while also spending absolutely insane amounts of money on this bet. During ZIRP, these companies would have turned down any investor asking questions. Less contagion when rates aren't zero hopefully? :grimace:

svara an hour ago | parent | prev [-]

There's still a lot of room for the best models to get better at coding .

Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.

stopachka 5 minutes ago | parent | prev | next [-]

I don't quite understand, what would 100K buy you?

AFAIK you would get about ~5 concurrent users, with a max context window of ~128K tokens on the larger models.

This wouldn't be good enough for coding -- are you guys thinking of using it for something else?

EvanAnderson 2 hours ago | parent | prev | next [-]

> ...we are already looking at dropping $100k on hardware to run local models...

Just think how much further that $100K would have gone if the hardware market wasn't so screwed-up.

Anecdote: I priced-out adding 1TB of RAM to a four node cluster a couple months ago. The cluster was purchased in fall of 2024 w/ 4 nodes, each with 256GB RAM. The nodes cost just over $14K apiece back in 2024 (entire box, not just the RAM).

Dell wanted >$90K a couple months ago to add 256GB to each node.

cyberax an hour ago | parent [-]

> Dell wanted >$90K a couple months ago to add 256GB to each node.

RAM is expensive, but not THAT expensive. I just bought 128Gb for about $5k for our build cluster (it's not even for AI, sigh). Even if you need larger-sized DIMM sticks, it's still going to be in the vicinity of ~15k tops.

EvanAnderson 17 minutes ago | parent | next [-]

It was crazy. I found the part on the open market for a lot less but the edict from the Customer was to buy from Dell to keep the support entitlement intact. That inflated the price to an astronomical level to be sure.

I haven't had problems w/ Dell support and 3rd party memory, personally, but given the machines' application I understood the concern.

an hour ago | parent | prev [-]
[deleted]
MASNeo an hour ago | parent | prev | next [-]

On prem AI makes sense for more than just the cost. More control, IP, model improvements you can keep, data privacy to name a few. People will realize that AI is not like compute the moment they get their own knowledge sold back at a premium.

arbuge 2 hours ago | parent | prev | next [-]

> In a few years, there will be hardware capable of running frontier models good enough for most things at accessible prices for even tiny companies.

What makes you so confident about this prediction? Hardware costs haven't exactly been cratering recently.

cmdrk an hour ago | parent | prev | next [-]

Do you think this will be a trend for larger companies as well?

The decadal move to all-cloud-all-the-time killed off in-house hardware teams while the C-suite chased their OpEx dreams.

It would be interesting if we come full circle on this.

disiplus 42 minutes ago | parent | prev | next [-]

same, but you need more then 100k of hw to run something like kimi k2.6 for a bigger team. on the other hand there is a ds4 flash that you can run on a macbook with 128gb ram. an that one is perfectly usable for a lot of tasks.

https://github.com/antirez/ds4

33 minutes ago | parent | prev | next [-]
[deleted]
mv4 an hour ago | parent | prev | next [-]

I configured a dual DGX Spark cluster, and it's certainly "good enough" for my agentic and coding needs.

datadrivenangel an hour ago | parent [-]

what models are you using on that? My experiences with apple hardware have convinced me that it is not really good enough for coding locally.

irishcoffee an hour ago | parent [-]

It isn’t the models, it’s the closed api and the tooling associated with it. It’s driving me crazy how not-talked-about this is.

datadrivenangel an hour ago | parent [-]

As in the coding harnesses?

alex_suzuki 2 hours ago | parent | prev | next [-]

I’m curious: are you spending on beefy developer machines, or some kind of shared local inference server? Would be interested to know more if it’s the latter.

irishcoffee 2 hours ago | parent [-]

I am aware of at least a handful of companies doing the latter. I don’t work for them and cannot speak to their setup.

nonethewiser 2 hours ago | parent | prev | next [-]

What models? Last I tried different local modals there was a pretty big difference from frontier.

awesome_dude 2 hours ago | parent | prev [-]

> In a few years, there will be hardware capable of running frontier models good enough for most things at accessible prices for even tiny companies.

I was going to say - the models are just going to keep growing at a pace exceeding the pace of hardware pricing/availability

But then I realised that, far more likely, there will be a plateau reached (again) where nobody is seeing gain, and at that point hardware will catch up