Luckily local AI is becoming more feasible every day.

▲ Someone1234 8 hours ago | parent | next [-]

It feels more and more like OpenAI/Anthoropic aren't the future but Qwen, Kimi, or Deepseek are. You can run them locally, but that isn't really the point, it is about democratization of service providers. You can run any of them on a dozen providers with different trade-offs/offerings OR locally.

They won't ever be SOTA due to money, but "last year's SOTA" when it costs 1/4 or less, may be good enough. More quantity, more flexibility, at lower edge quality. It can make sense. A 7% dumber agent TEAM Vs. a single objectively superior super-agent.

That's the most exciting thing going on in that space. New workflows opening up not due to intelligence improvements but cost improvements for "good enough" intelligence.

▲ 2ndorderthought 3 hours ago | parent | next [-]

You can run local models on junker laptops for specific tasks that are about as good as last years SOTA. If the manufactured compute hardware shortage wasn't happening a lot more people would be running two months ago SOTA locally right now. Funny thoughts...

▲ echelon 8 hours ago | parent | prev [-]

Open Source isn't even within 50% of what the SOTA models are. Benchmarks are toys, real world use is vastly different, and that's where they seriously lag.

Why should anyone waste time on poorer results? I'd rather pay my $200/mo because my time matters. I'm not a poor college student anymore, and I need more return on my time.

I'm not shitting on open weights here - I want open source to win. I just don't see how that's possible.

It's like Photoshop vs. Gimp. Not only is the Gimp UX awful, but it didn't even offer (maybe still doesn't?) full bit depth support. For a hacker with free time, that's fine. But if my primary job function is to transform graphics in exchange for money, I'm paying for the better tool. Gimp is entirely a no-go in a professional setting.

Or it's like Google Docs / Microsoft Office vs. LibreOffice. LibreOffice is still pretty trash compared to the big tools. It's not just that Google and Microsoft have more money, but their products are involved in larger scale feedback loops that refine the product much more quickly.

But with weights it's even worse than bad UX. These open weights models just aren't as smart. They're not getting RLHF'd on real world data. The developers of these open weights models can game benchmarks, but the actual intelligence for real world problems is lacking. And that's unfortunately the part that actually matters.

Again, to be clear: I hate this. I want open. I just don't see how it will ever be able to catch up to full-featured products.

▲ twobitshifter 7 hours ago | parent | next [-]

Unless you are getting outside of your comfort zone and taking a month off from your $200 subscription, every other month, I can’t see how you can make the universal claim that the open weights models are all 50% as good. Just today, DeepSeek released a new model, so nobody knows how that will compare, a week ago it was Gemma 4, etc. I’m okay with you making a comparison, but state the model and the timeframe in which it was tested that you are basing your conclusions on.

▲ MostlyStable 8 hours ago | parent | prev | next [-]

I think that there will come a point when open source models are "good enough" for many tasks (they probably already are for some tasks; or at least, some small number of people seem happy with them), but, as you suggest, it will likely always (for the forseeable future at least) be the case that closed SOTA models are significantly ahead of open models, and any task which can still benefit from a smarter model (which will probably always remain some large subset of tasks) will be better done on a closed model.

The trick is going to be recognizing tasks which have some ceiling on what they need and which will therefore eventually be doable by open models, and those which can always be done better if you add a bit more intelligence.

▲ bachmeier 7 hours ago | parent | prev | next [-]

> Benchmarks are toys, real world use is vastly different...Why should anyone waste time on poorer results? I'd rather pay my $200/mo because my time matters.

This kind of rhetoric is not helpful. If you want to make a point, then make one, but this adds nothing to the conversation. Maybe open source models don't work for you. They work very well for me.

▲ lelanthran 3 hours ago | parent | prev | next [-]

> Open Source isn't even within 50% of what the SOTA models are.

The gap has been shrinking with each release, and the SOTA has already run into diminishing returns for each extra unit of data+computation it uses.

Do you really want to bet that the gap will not eventually be a hairs breadth?

▲ kube-system 8 hours ago | parent | prev | next [-]

There's going to be a day when we look back at $200/mo price tags and say "wow that was cheap".

The breakeven at this price is 6 minutes of productivity per work day for an engineer making $200k.

▲ cheschire 7 hours ago | parent | next [-]

Okay, but then by that logic a person making only $20k would break even at about an hour.

Are you suggesting that someone making $20k should be spending $200/mo on Claude?

	▲	kube-system 6 hours ago \| parent [-]
		I'm talking about the cost of labor. If you pay someone $20,000 for labor, and they save 65 minutes worth of labor per day using a $200/mo Claude subscription, you are better off buying the Claude subscription.

▲ echelon 6 hours ago | parent | prev [-]

Everyone is arguing why I'm wrong or that I should have presented more data.

You've got the real insight with this claim.

This is the way the world is moving. Open source isn't even going where the ball is being tossed. There is no leadership here.

You're spot on.

If the cost to deliver a unit of business automation is:

    A. $1M with human labor

    B. $700k human labor + open source models

    C. $500k human labor + $10,000 in claude code max (duration of project)

    D. $250k with humans + $200k claude code "mythos ultra"

The one that will get picked is option "D".

Your poor college students and hobbyists will be on option "B". But this won't be as productive as evidenced by the human labor input costs.

Option "C" will begin to disappear as models/compute get more expensive and capable.

Option "A" will be nonviable. Humans just won't be able to keep up.

Open source strictly depends on models decreasing their capability gap. But I'm not seeing it.

Targeting home hardware is the biggest smell. It's showing that this is non-serious, hobby tinkery and has no real role in business.

For open source to work and not to turn into a toy, the models need to target data center deployment.

	▲	kube-system 5 hours ago \| parent [-]
		Yeah, I don't wanna shit on open source, there will certainly be uses for all different kinds of models. The real money in this market, though, is going to be made in the C suite, and they don't really care about the model. They don't care if it's open source, closed source, or what it is. They don't want to buy a model. They're interested in buying a solution to their problems. They're not going to be afraid of a software price tag -- any number they spend on labor is far more. Labor is something like 50%+ of the Fortune 500's operating expenses -- capturing any chunk of this is a ridiculous sum of money.

▲ Someone1234 8 hours ago | parent | prev | next [-]

> Open Source isn't even within 50% of what the SOTA models are.

When was the last time you used any of them? Because, a lot of people are actively using them for 9-5 work today, I count myself in that group. That opinion feels outdated, like it was formed a year ago+ and held onto. Or based on highly quantized versions and or small non-Thinking models.

Do you really think Qwen3.6 for a specific example is "50%" as good as Opus4.7? Opus4.7 is clearly and objectively better, no debate on that, but the gap isn't anywhere near that wide. I'd call "20%" hyperbole, the true difference is difficult to exactly measure but sub-10% for their top-tier Thinking models is likely.

▲

cwnyth 6 hours ago | parent | next [-]

Their opinion is also behind on LibreOffice, too. I won't defend GIMP's monstrosity, but I finished a whole dissertation, do all my regular spreadsheet work (that isn't done via R), and have created plenty of visual mockups with LibreOffice. Plus, I don't have to deal with a spammy Windows environment.

Sure, we use Google Drive, too, but that's just for sharing documents across offices, not for everyday use. For that, the open source model is a clear winner in my book.

▲

vlovich123 8 hours ago | parent | prev [-]

Qwen3.6 at which model size and quantization? I already think Opus 4.6 is usable but still dumb as bricks. A 20% cut off that feels like it would still be unusable. And that's not even getting to the annoyance of setting everything up to run locally & getting HW that can run it locally which basically looks like a Macbook M4 these days as the x86 side is ridiculously pricey to get decent performance out of models.

▲

Someone1234 4 hours ago | parent [-]

At their highest model size and quant. We are discussing price and quality at the top, not what you can run on the lower end.

So the starting point is Opus 4.7 pricing and we're contrasting alternatives near the top end (offered across multiple providers).

Also I said 20% was hyperbole, meaning far too high.

▲

vlovich123 4 hours ago | parent [-]

That makes no sense because the largest Qwen models are not even open weight so I’m not sure how that’s any different.

	▲	Someone1234 4 minutes ago \| parent [-]
		Right, which isn't what we're discussing, since I mentioned "across multiple providers" in every comment about this topic.

▲ oceanplexian 7 hours ago | parent | prev | next [-]

> Benchmarks are toys, real world use is vastly different, and that's where they seriously lag.

I'm not disagreeing per-se but if you think the benchmarks are flawed and "my real world usage" is more reflective of model capabilities, why not write some benchmarks of your own?

You stand to make a lot of money and gain a lot of clout in the industry if you've figured out a better way to measure model capability, maybe the frontier labs would hire you.

▲ bandrami 7 hours ago | parent | prev | next [-]

> Why should anyone waste time on poorer results?

Because in almost no real-world project is "programming time" the limiting factor?

	▲	dymk 7 hours ago \| parent [-]
		No, it's rate at which you can solve problems, and weaker models waste your time because they don't solve problems at the same speed.

▲ conrs 7 hours ago | parent | prev | next [-]

IMO It's a different and new model. We're engineers, and we're rich. It's not going to be good enough for us. But the much larger market by far is all the people who used to HAVE to work with engineers. They now have optionality; the pendulum is going to swing.

▲ swader999 7 hours ago | parent | prev | next [-]

Also, this space will (and perhaps already is for some of us) be an arms race. Sure you can go local but hosted will always be able to offer more and if you want to be competitive, you'll need to be using the most capable.

▲ nancyminusone 6 hours ago | parent | prev | next [-]

People pirate photoshop and office if they don't want to pay for it, making it as "free" as GIMP. If there is a free option people will use it. never underestimate the cheapskates.

▲ kardos 6 hours ago | parent | prev | next [-]

If sharing all of your code with the closed providers is OK then it works. If that is a blocker, open weights becomes much more compelling...

▲ joquarky 5 hours ago | parent | prev | next [-]

What will you do when they stop burning cash and the $200 plan becomes $2000?

▲ brazukadev 8 hours ago | parent | prev | next [-]

> Open Source isn't even within 50% of what the SOTA models are

Who said so? GLM 5.1 is 90% Opus, at least. Some people quite happy with Kimi 2.6 too. I did not try Deepseek 4 yet but also hearing it is as good as Opus. You might be confusing open source models with local models. It is not easy to run a 1.6T model locally, but they are not 50% of SOTA models.

▲ jawilson2 3 hours ago | parent | prev [-]

I think the problem is that we're all waiting for the patented Silicon Value Rug Pull and ensuing enshittification, where there are a dozen tiers of products, you need 4 of them, and they now cost $2000/month. I want to hedge against that.

▲ fourside 8 hours ago | parent | prev | next [-]

Maybe for folks who are deep into this, but it’s not exactly accessible. I tried reading up on it a couple of months ago, but parsing through what hardware I needed, the model and how to configure it (model size vs quantization), how I’d get access to the hardware (which for decent results in coding, new hardware runs $4k-$10k last I checked)—it had a non trivial barrier of entry. I was trying to do this over a long weekend and ran out of time. I’ll have to look into it again because having the local option would be great.

Edit: the replies to my comment are great examples of what I’m talking about when I say it’s hard to determine what hardware I’d need :).

▲

imetatroll 3 hours ago | parent | next [-]

For me the big hangup is the hardware. If I could find a simple guide to putting together a machine that I can run off an outlet in my home, I am sold. The problem is that I haven't found this yet (though I suppose I haven't looked very hard either).

▲

jonaustin 7 hours ago | parent | prev | next [-]

Just get a decent macbook, use LM Studio or OMLX and the latest qwen model you can fit in unified ram.

Hooking up Claude Code to it is trivial with omlx.

https://github.com/jundot/omlx

▲

root_axis 8 hours ago | parent | prev [-]

> new hardware runs $4k-$10k last I checked

Starting closer to 40k if you want something that's practical. 10k can't run anything worthwhile for SDLC at useful speeds.

▲

zozbot234 7 hours ago | parent [-]

$10K should be enough to pay for a 512GB RAM machine which in combination with partial SSD offload for the remaining memory requirements should be able to run SOTA models like DS4-Pro or Kimi 2.6 at workable speed. It depends whether MoE weights have enough locality over time that the SSD offload part is ultimately a minor factor.

(If you are willing to let the machine work mostly overnight/unattended, with only incidental and sporadic human intervention, you could even decrease that memory requirement a bit.)

▲

SwellJoe 7 hours ago | parent [-]

You can't put "SSD offload" and "workable speed" in the same sentence.

	▲	zozbot234 6 hours ago \| parent [-]
		As a typical example DeepSeek v4-pro has 59B active params at mostly FP4 size, so it needs to "find" around 30GB worth of params in RAM per inferred token. On a 512GB total RAM machine, most of those params will actually be cached in RAM (model size on disk is around 862GB), so assuming for the sake of argument that MoE expert selection is completely random and unpredictable, around 15GB in total have to be fetched from storage per token. If MoE selection is not completely random and there's enough locality, that figure actually improves quite a bit and inference becomes quite workable.

▲ nozzlegear 8 hours ago | parent | prev | next [-]

I've been using local AI via LM Studio ever since I canceled my Claude subscription. It's obviously slower than Claude on my M1 Studio[†], but like someone else said, I use AI more like a copilot than an autopilot. I'm pretty enthused that I can give it a small task and let it churn through it for a few minutes, while I work on something alongside – all for free with no goddamned arbitrary limits.

[†] The latest Qwen 3.6 whatever has been a noticeable improvement, and I'm not even at the point where I tweak settings like sampling, temperature, etc. No idea what that stuff does, I just use the staff picks in LM Studio and customize the system prompts.

▲ politelemon 8 hours ago | parent | prev | next [-]

Feasibility on commodity hardware would be the true watermark. Running high end computers is the only way to get decent results at the moment, but if we can run inference on CPUs, NPUs, and GPUs on everyday hardware, the moat should disappear.

	▲	zozbot234 7 hours ago \| parent [-]
		You can already run inference on ordinary hardware but if you want workable throughput you're limited to small models, and these have very poor world-knowledge.

▲ aleqs 8 hours ago | parent | prev | next [-]

Indeed, I feel like we are in the early computer equivalent phase of AI, where giant expensive hardware is still required for frontier models. In 5 years I bet there will be fully open models we'll be able to run on a few $1000 of consumer hardware with equivalent performance to opus 4.7/4.6.

▲

whattheheckheck 7 hours ago | parent [-]

You'll never have the power of what they have though. Cloud capital is insane.

So you can run 1 agent locally on $1k to $3k hardware

They can run a fleet of thousands

	▲	nozzlegear 6 hours ago \| parent \| next [-]
		But does one individual need a fleet of thousands of agents?
	▲	aleqs 7 hours ago \| parent \| prev [-]
		I think intelligence per compute will go up significantly in the coming years, while the cost per compute will drop significantly. No way to know for sure, so I guess we'll see

▲ andyfilms1 8 hours ago | parent | prev | next [-]

Sure, but local AI is still a black box. They can be influenced by training data selection, poisoning, hidden system prompts, etc. That recent Wordpress supply chain hack goes to show that the rug can still be pulled even if the software is FOSS.

▲ ModernMech 8 hours ago | parent | prev | next [-]

I love how it's just a tacit understanding that these companies' entire MO is to carve out a territory, get everyone hooked on the good stuff and then jack up the price when they're addicted and captured -- literally the business plan of crack dealers, and it's just business as usual in the tech industry.

	▲	strbean 8 hours ago \| parent \| next [-]
		I was recently introduced to the term "vcware", ala shareware or vaporware, to describe these products. "Don't use that, it's vcware, enshitification is coming soon."
	▲	baq 8 hours ago \| parent \| prev [-]
		https://en.wikipedia.org/wiki/Enshittification 101

▲ root_axis 8 hours ago | parent | prev [-]

Not really. The hardware requirements remain indefinitely out of reach.

Yes, it's possible to run tiny quantized models, but you're working with extremely small context windows and tons of hallucinations. It's fun to play with them, but they're not at all practical.

▲

ac29 7 hours ago | parent [-]

The memory requirements aren't that intense. You can run useful (not frontier) models on a $2-5K machine at reasonable speeds. The capabilities of Qwen3.6 27B or 35B-A3B are dramatically better than what was available even a few months ago.

Practical? Maybe not (unless you highly value privacy) because you can get better models and better performance with cheap API access or even cheaper subscriptions. As you said, this may indefinitely be the case.

	▲	root_axis 4 hours ago \| parent [-]
		> The capabilities of Qwen3.6 27B or 35B-A3B are dramatically better than what was available even a few months ago. Yes, a lot better, but still terribly unreliable and far less capable than the big unquantized models.