More signal that the open-weight models should be our destiny as an industry. These proprietary models are being used to usher in more surveillance and gatekeeping across the industry.

▲

roadside_picnic 9 hours ago | parent | next [-]

I have a home server that runs Qwen3.6-35B-A3B through llama.cpp with Open WebUI for the user facing interface.

My teen isn't super interested in AI, but whenever they do feel curious they have their own account they can use on our home network. As far as chatting goes local models are more than capable for handling standard chat questions, doing research, helping troubleshoot problems etc. In fact it was an agent powered by the same model that setup the open webui server and took care of all the account management features through my phone (using Hermes agent).

If you're building AI powered features and using sophisticated agent setups for coding for work, then it make sense to use SoTA from these providers. But I've been using local models increasingly for personal use and am starting to find them preferable (I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for).

Still haven't cancelled my personal Anthropic subscription, but considering it soon.

▲

jrochkind1 9 hours ago | parent | next [-]

What about local models do you find preferable?

I guess "starting to find them preferable" suggests to me you think they work better, but this is surprising to me so I think I may have misunderstood, so I ask!

Like you're saying they work better than the proprietary models (in what ways?), or you find them mostly good enough and prefer the privacy or cost, or what?

	▲	roadside_picnic 9 hours ago \| parent \| next [-]
		There are a couple of things, but basically it boils down to the same reason people prefer Linux to Windows/MacOs: customization, control and privacy (arguably all of these are really subsets of 'control'). Having full control over how your data is retained, what the system prompt is, which version of the model you're running, etc leads to much a more consistent experience. For example, for chat sessions, I can't stand the new "let me push back" version of Claude. For my home models I never have to worry about that. There's never a mystery as to whether the model secretly degraded performance, I always know exactly which model I'm using and how well it's utilizing resources etc. Open models also give you full visibility into the reasoning steps, so you never have to guess what the model is thinking. Then when you start getting into things like uncensored/abliterated models we're talking about something you can't even pay for. In case you're unfamiliar, even open local models have guardrails built in. But people in the community have found ways to remove these. One of the things I've found most concerning about AI, which is under discussed, is the combination of people having personal chats with an agent that both monitors the conversation and refuses to discuss certain topics. This leads to a very deep level of self-censoring I find dystopian. I also have multiple hermes agents setup, some with local backends other with open but non-local backends (e.g. Kimi through the API). For some tasks, I've just started to find the local agent tends to work better for the type of tasks I want (maybe it just over thinks less?). I don't use it for coding so much as research tasks and sysadmin stuff, but I've been really happy with the results. Oh, and let's not forget, especially running on a Mac, these local models are basically free to run.
	▲	jauntywundrkind 7 hours ago \| parent \| prev [-]
		The local models are willing to share their thinking. The Big AI models don't share their thinking, leaving only vague summaries. Having an AI that deliberately cloaks it's reasoning, that goes out of it's way to act like a Searls Chinese Room Experiment, that deliberately conceals information is incredibly gross. I love what I get from Opus or GPT, but mainly I use GLM and it's so starkly apparent how much better it is that it let's me work together with it, that I can nudge it as it works by correcting bad assumptions or clarifying for it, as it works. And... it just doesn't feel icky. It's not a quasi-mystical alien intelligence, which, honestly, gives me strong "this should be destroyed, is unsafe, and feels outright impermissible" vibes. As a coder, seeing thinking saves time and prevents errors. As a civilization, seeing thinking let's people understand what the AI is working with and grounds society in an appreciation for what is happening, keeps us a little moored. Personally, if I were a government, I would not allow it. Recent submission on this, The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535

▲

drusepth 9 hours ago | parent | prev | next [-]

What is an "ephemeral" model in this context?

	▲	roadside_picnic 9 hours ago \| parent [-]
		Just running it through `llama-cli` so that there's absolutely no persistent state related to the chat (and least I believe this to be the case).

▲

agumonkey 9 hours ago | parent | prev | next [-]

What kind of machine is it running on ?

	▲	bakies 5 hours ago \| parent [-]
		I just started using this model on my Framework Desktop and it's very smart and fast.

▲

rvnx 9 hours ago | parent | prev | next [-]

From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

What uncensored model do you recommend using ?

▲

panny 9 hours ago | parent [-]

>From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

>So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

That is bonkers. If I were a parent, I would hope my child would trust me more than systems monitored by FBI/NSA/etc. Like, what sort of sick relationship do you have to have with your own family to trust them less than strangers who would sell you into prison slavery for a buck.

	▲	rvnx 8 hours ago \| parent [-]
		Private conversations of a teen have low value for FBI/NSA. They have infinite value to their parents. The state isn't going to ground them, shame them at dinner, out them, or pull them out of a relationship, punish them. Parents reading your browsing history and private conversations when you are 14-18 years old (the age of teenagers) is very very creepy, unless there is a specific danger to avoid. It's like if you read their private journal. Adolescents need a private inner world to form an identity, and heavy parental intrusion ("psychological control") is the real distrust. Trust them, they are people, not possessions. You can guide them, but do not store their private messages locally under your control using the excuse of protecting them from NSA. If they trust you, they will tend to tell you upfront the things they have questions about, there is really no need to spy on their thoughts. Same with husband/wife btw.

▲

rib3ye 9 hours ago | parent | prev | next [-]

How many tokens /sec?

	▲	roadside_picnic 8 hours ago \| parent [-]
		M3-Max laptop: ~55 token/sec RTX 4090: ~190 token/sec I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible. The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.

▲

ai_fry_ur_brain 9 hours ago | parent | prev [-]

> I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for.

Dont. Goon. To. LLMs

▲

fyltr 9 hours ago | parent [-]

Wasn't the parent post referring to 'legitimate' demands? I often use them to get a broad overview of a technical field before reading human stuff on it, and it might be me but those clankers tend to spend half their reasoning on whether they are allowed to reply to my request. Censorship is an annoying waste of capacity for certain use cases, although it certainly has its boons when shipping commercial models.

	▲	ai_fry_ur_brain 4 hours ago \| parent [-]
		He was definately referring to gooning to llms.

▲

extr 9 hours ago | parent | prev | next [-]

They are not going to let open weights models with zero restrictions exist dude. They will be regulated like guns, or probably closer to nerve gas or enriched uranium.

▲

pc86 9 hours ago | parent | next [-]

Only if you let them.

	▲	extr 9 hours ago \| parent [-]
		I don't know that I want to stop such a thing. It's good that nerve gas is banned. I don't want random people having access to easy-to-follow instructions to make COVID-29.

▲

infamouscow 8 hours ago | parent | prev [-]

The government is not going to enforce this, the game theory does not work in their favor.

The SCOTUS has made it exceptionally clear mathematics and software are protected by the First Amendment. The Atomic Energy Act of 1954 tries to make a very narrow exception for nuclear weapons, but

1. The law has never been challenged in court for being unconstitutional, and

2. It doesn't apply to model weights

Any attempt by the government to suppress open models will meet legal challenges on the grounds of (1) or (2).

Congress could amend the act to include model weights, but that won't prevent legal challenges on the grounds of it being unconstitutional (which it is).

▲

mrits 10 hours ago | parent | prev | next [-]

Either way I don't think this will end well for humanity.

▲

scottyah 9 hours ago | parent [-]

How could it not? I get the whole fear of AI making robots and going anti-human, but after using the tech for a few months now that seems too absurd.

▲

thewebguyd 9 hours ago | parent | next [-]

Because (collective) we don't own the tech. Frontier models are proprietary, their reasoning logic is hidden, and as seen with Fable the government giveth and taketh away on a whim.

Capabilities can be gated behind certification programs, or by money, or any other numerous corrupt and non-corrupt means. Model capabilities can be segregated by pricing tiers, creating an economic underclass that cannot afford access to frontier intelligence.

For humanity to benefit, the tech needs to be open and equally available to all.

▲

jrockway 9 hours ago | parent | next [-]

I agree with this. Computing as a field is the way it is because there is a low barrier to entry. My dad gave me a Tandy 1000 and some programming books, and now I have a very lucrative career. I never took any classes. I never had to beg anyone for permission. I could just get started making things with the minimal investment of a cheap personal computer. (And eventually, an Internet connection. Working with other people is fun!)

In a world where everyone is a Claude controller (something I honestly enjoy!), that goes away. I use hundreds of dollars of tokens a month. Suddenly, the kid in her basement with an unloved computer can't get in on the ground floor. You have to be rich to even get started. That worries me deeply. It's a big change for our field, and I don't think it's a good one.

▲

scottyah 9 hours ago | parent [-]

Did your dad give you a Tandy 1000 or a Cray X-MP/48? Do you really think you need the most top-of-the-line model to learn anything, or will a locally run gemma4 (or whatever it turns into) still get you going just the same as when you were a child?

	▲	jrockway 7 hours ago \| parent [-]
		That's true, local models are good. The Tandy 1000 was really only good for a little bit QBASIC. Still fun! The other is that computing in general feels more accessible? While some models are free, you still can't easily make your own model. But I see the argument where you can't really just build your own computer anymore (no one person knows how to make a modern CPU, and you can't do it at home). You are always beholden to society, nothing truly starts in your basement at home. And it didn't in the nostalgia era that I remember either.

▲

axus 9 hours ago | parent | prev | next [-]

AI isn't the problem, concentration of power is the problem. I think we agree!

▲

scottyah 8 hours ago | parent [-]

Your "concentration of power" is just two labs making models that most people prefer the last couple of months. Neither has more access to capital and resources than Google, more ability to pivot quickly than Xai, more access to labor than all of the Chinese labs, etc. How do you keep from a "concentration of power" without just forcing subsets of the population to use a known lesser model, or purposely kneecapping Research and Development at the labs that currently have the best models?

	▲	axus 8 hours ago \| parent [-]
		I was agreeing with the parent's conclusion: "For humanity to benefit, the tech needs to be open and equally available to all." Reducing the power of AI / restricting its export / arresting people who "use it wrong" is counter to that.

▲

scottyah 9 hours ago | parent | prev [-]

Do you hate all lessons from humanity's past or just the most important ones? If it takes work from a specific subset of the population and isn't compensated, then my friend, what you advocate for is slavery...

▲

thewebguyd 9 hours ago | parent [-]

Ah yes, I forgot, Linus Torvalds and the thousands of others that built Linux over time are all slaves. Guess someone should probably go rescue them.

▲

scottyah 9 hours ago | parent [-]

None of them were compelled, and nobody is stopping you from running your own LLM generously provided by others. Doesn't mean when linux came out people nationalized Apple and Microsoft.

	▲	thewebguyd 8 hours ago \| parent [-]
		The risk I'm talking about isn't nationalization of companies, its corporate monopolization of frontier intelligence capabilities through capital consolidation and regulatory capture. "Just run your own LLM" ignores the asymmetry of frontier intelligence. You can build an operating system in your garage with just time and cheap hardware. You cannot go build GPT-5. And that's the problem with keeping it proprietary. If the primary cognitive engines of human progress are consolidated within just a handful of closed, proprietary cartels that can gate, alter, and revoke capabilities at will it creates a permanent economic underclass. The foundational infrastructure of our collective future shouldn't be entirely walled off. Fair compensation for a commercial product doesn't mean monopolization of foundational capabilities.

▲

munk-a 9 hours ago | parent | prev | next [-]

There are two rationale objections, I think...

One is the potential for skill rot where AI grows a heavy dependence in new employees and once the real price per token cost is settled on and discoverable (post massive IPOs and probably a while post - not immediately after) we, as a society, are left with a bunch of people dependent on a deeply inefficient technology to maintain software we now view as vital that might severely impede our ability to actually deal with climate change (press X to doubt Bezos).

The second is that the psychological damage of interacting with models in a social context during your formative years is deeply damaging and we've essentially destroyed the ability for a generation or two to actually interact as productive members of society.

Addressing the second issue doesn't necessarily exclude our ability to leverage models for business productivity but it seems unlikely to happen in the current climate without that also happening. I am hesitant to believe in a sudden outbreak of common sense at this point. The first point, could really be a systems collapse trigger - we can argue about the likelihood but denying it as a possibility is excessively naive.

▲

scottyah 9 hours ago | parent | next [-]

Both seem to just point at the WALL-E outcome, summarized as humans outsourcing too much thinking. I just don't see that as an end- just another divide between people. I'm seeing some degradation for sure, but not really an "end".

▲

pc86 9 hours ago | parent | prev | next [-]

What climate change have to do with anything?

	▲	fyltr 8 hours ago \| parent [-]
		there are claims that llms might be taxing on the planet to run BUT that they will solve [some, all] problems including climate change and therefore be beneficial in the long run.

▲

sevenzero 9 hours ago | parent | prev [-]

I agree with the skill drain argument but also think its a little too dramatic. Most people still can do the shit claude does for them, it just takes them 10x as long.

▲

petre 9 hours ago | parent | prev | next [-]

It's would probably just burn more gas and make the climate even worse. Some assholes will get richer in the process.

▲

scottyah 9 hours ago | parent [-]

But "some assholes" is an extremely large, growing group of people. Do you have any idea how much more productive small business owners are now? It's an insane boost for people who didn't want to spend their time on things that are extremely critical for business but not the focus of the business.

▲

hn_acc1 9 hours ago | parent [-]

And people loved "free next day delivery" from Amazon, when it started. It's not quite the same level of service anymore, and membership has gone up in price.

Would these businesses pay 2x? 5x? 10x? What is their breaking point? I'm sure xAI/OpenAI/whoever will find it and charge 0.9x that (eventually). Just look at telecoms / internet access and their rubbish "network congestion" claims to keep raising prices.

	▲	scottyah 8 hours ago \| parent [-]
		I still get a lot of free next day, and now sometimes even same day, delivery for amazon. I doubt the membership prices has even matched inflation, but it is certainly well worth it. I can't see any governmental or volunteer organization that would produce even slightly comparable results with the same budget.

▲

hn_acc1 9 hours ago | parent | prev [-]

How can it end well, when it's mostly owned / controlled by narcistic billionaires who would love to eradicate anyone who so much as looks at them sideways? And who view "mass population reduction" and "I'll get to be a king in my castle, served by peons who depend on my favor to live" as the most desirable outcome of AGI?!?

If even one of these had pledge that all profit goes to end world hunger, cancer research, etc, I could possibly see it - but they haven't. They're all after finding a way to be the biggest, richest asshole possible with the ability to crush anyone in their way..

	▲	scottyah 9 hours ago \| parent [-]
		Have you isolated yourself completely from reality? I don't even know where to begin on this. Let's start with the fact that China is pumping out some near-frontier models and open sourcing the weights- and they don't even follow capitalism and the owners aren't billionaires. Really there are like four models in the USA that are "owners/controllers", and only one is even slightly controllable by its CEO, though none of the frontier models can last a week without the support of entire teams. Why on earth would you want to siphon off the proceeds of AI development to (ok my bias is strong here- mostly corrupt) "ideals" like world hunger and cancer research (that probably get more dollars annually than the sum of actual profit any of these companies will ever get). That would just instantly kill the ability to improve AI at all, and the world could possibly be better for a few months?

▲

CobrastanJorji 9 hours ago | parent | prev | next [-]

Someone should start a nonprofit company focused on developing Open AI. I bet we could even get some sensible billionaires to help the effort.

▲

jaredsohn 9 hours ago | parent | next [-]

Maybe one of those trillionaires could help for a bit before leaving to make his own AI model, too.

▲

codedokode 9 hours ago | parent | prev | next [-]

And we are definitely not going to put our users on a watch list DB and send their data to the government?

And how do we prevent Chinese companies from training on our open AI models and offering their models for free?

▲

jrockway 9 hours ago | parent [-]

How does Red Hat prevent Chinese companies from producing a Linux distribution for free? They don't. And yet they still exist.

	▲	rvnx 9 hours ago \| parent [-]
		They can't prevent the innovation, competition and engineering, but their lobbying makes sure that the Chinese competition doesn't enter the market, and if it does, with severe obstacles on the way. https://www.ibm.com/policy/contributions-and-expenditures Their biggest customer is the US federal government, taken in aggregate across agencies, IBM is one of the largest federal IT contractors, and deep public-sector and financial-services contracts in the US make it IBM's single largest national market. No individual commercial company comes close to the government's aggregate spend. Now, equivalent product, another company, they want to sell to the government twice cheaper, can they ? nope, it will be IBM winning. Furthermore, according to the lobbyists, China = evil but they forget that a lot of software contains Chinese code.

▲

biraj-rocks 9 hours ago | parent | prev | next [-]

i’d really love to be wrong, i don't think that the economics of it would let it happen.

the potential of wealth creation with AI is so high, and also the fact that research, pre-training and inference is so expensive that, that any open-AI would eventually become OpenAI.

▲

bckr 9 hours ago | parent | prev | next [-]

We could all chip in

	▲	janalsncm 9 hours ago \| parent [-]
		Based on recent SEC filings, you’ll soon be able to.

▲

jauntywundrkind 7 hours ago | parent | prev | next [-]

Allen Institute for Artificial Intelligence (ai2) is doing really good open source work in the west. It's awful that the west has so few other pokers in the fire here for nonprofit AI. https://allenai.org/

▲

9 hours ago | parent | prev [-]

[deleted]

▲

herodoturtle 10 hours ago | parent | prev | next [-]

I’m curious (and please forgive my ignorance if it’s obvious), are open weight models practically feasible?

I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

I guess I’m trying to understand the economics of it.

▲

SimianSci 9 hours ago | parent | next [-]

There is an understandable gap between the capabilities of closed models and those of open models. The current difference is primarily expressed in the cost of hardware necessary to sufficiently run a exactly comparable model. A single higher end graphics card running on your average gaming computer, is capable of running small to medium models that compare with those of their lab-born counterparts in the small-medium range. But the heavyweight models are still outside the realm of possibility for all but the most well-funded individual.

However, I would highly suggest more people experiment with these smaller models. They are incredibly capable in many ways that many people dont realize.

The perceived capabilities of the larger models are also much less the result of the model having more parameters/training cycles, but rather that they are being run through well-made harnesses, something which the open-source community is rapidly approaching with near-peer solutions of their own.

In short, much of the gap between between open-weight models and the larger proprietary models can be considered more of an issue of perception and not an issue of capability. There is a fundamental gap economically, but not so much in capability. The open source community is rapidly closing the gap on these larger labs, especially thanks to the amazing research being freely given openly by well funded chinese labs.

▲

anigbrowl 9 hours ago | parent | prev | next [-]

Sort of. A full trillion-parameter model needs about $300k of server hardware to run in and a lot of electricity, making it feasible only for very wealthy individuals, but quite practical for businesses and institutions above a certain size...although they in turn would typically gatekeep access.

You can drastically reduce the requirements by running models at a lower bitrate, which somewhat reduces accuracy but not that much - think of the difference between an MP3 vs uncompressed audio. With this and other tricks, you can get high end models down to a size where they can be run on a high spec desktop workstation affordable by an individual or small business.

Obviously I'm heavily oversimplifying here. I think a useful parallel is to consider situations from the past where you would once have required corporate budgets equivalent to the price of a house to run a large database, but over time it became accessible to anyone with the requisite expertise and relatively affordable hardware.

▲

sosodev 9 hours ago | parent [-]

You can run a trillion parameter model with decent quality for far less than $300k. A cluster of 4 AMD AI Max 395+ boards with 128GB unified memory each can be had for around $15k. That would run the 4-bit quant of a trillion param model well enough for personal use. At full use the cluster would only be consuming around 400-500W of power too. That's about the same as one high end graphics card.

That's still a lot of money, but most people don't really need a trillion parameter model. If privacy is more valuable than the frontier capabilities then they could almost certainly get by with much less.

▲

anigbrowl 2 hours ago | parent | next [-]

I literally wrote about running quantized models and how much more affordable it could be in the very next sentence. Please don't reply if you can't be bothered to read the entire comment, it's not that long.

	▲	sosodev 2 hours ago \| parent [-]
		I read the comment, thanks. I just disagree with your cost estimate. Even for a small business that needs high throughput they could probably do it for far less than $300k if they aren’t just blindly buying the first big nvidia setup they can.

▲

nijave 6 hours ago | parent | prev [-]

Which model? I see a suspiciously similar post on amd.com running 2 bit Kimi quant on a four node cluster over 5Gbps Ethernet

Assuming math works here although I think there's some caveats depending on the model architecture, 1T 4 bit is 465Gi just for the weights so you wouldn't be able to fit kv cache.

It's showing about 8-9 tk/sec which seems quite slow for something like a web search with result aggregate although maybe bareable for smaller context stuff

The thing I've been running into with z.ai hosted GLM-5.2 is the 2024 knowledge cutoff. Anything recent requires web augmentation which is more token intensive so low tk/sec hurts even more than a "smarter" model

It seems (somewhat unsurprisingly) open weight models have older knowledge cutoffs.

▲

sosodev 5 hours ago | parent [-]

I don’t have any particular model in mind, sorry. My data is just rough estimates based on my experience with a single node setup. You might need to opt for a 2 or 3 bit model to get the full context window. The KV cache memory consumption as well overall performance will be heavily dependent on the model’s architecture. The performance too will depend a lot on the inference server chosen and its configuration. I suspect a sub-agent running a much smaller model would be the ideal way to get the latest knowledge via web search and summarization.

I’m not trying to say that this would be a great experience or really compete with just buying a subscription to the top models. Rather I just wanted to point out that $300k is an absurd estimate for a trillion param model meant for personal use.

▲

nijave 5 hours ago | parent [-]

I imagine a smaller single node model would have a significantly better experience at significantly lower cost. When I was poking around with infra estimates it seemed the main issue/cost was once you crossed from single-node to multi-node. You need _a lot_ of bandwidth if the weights are sharded. Like Tbps of bandwidth. The closest reasonable thing I've heard of for local multi-node is exo on macos using thunderbolt interconnect.

	▲	sosodev 2 hours ago \| parent [-]
		I think it really just depends on your goals. Slow tokens per second is fine by some people if they cost a fraction of a single node setup that can run a trillion param model. If you’re actually running a small business and want to have multiple users getting a good experience in parallel then yeah I think you need a single node. At that point you can afford it I suppose. I don’t know what the scaling for multiple strix halo boards looks like in practice. From what I understand each server has to process the model in serial. Meaning server A has 1/4 the weights and sends server B the results to process and so on. So you don’t get compute scaling just memory scaling.

▲

roadside_picnic 9 hours ago | parent | prev | next [-]

See my comment to parent. I've been using local LLMs for practical, personal tasks for a few months now very successfuly.

You can run fantastic local models if you have either:

- M-series Apple device with ideally >= 24GB of VRAM

- RTX [345]090 GPU

I'm fortunate enough to have both and use an M-series laptop as basically a persistent server (I don't use it much and when traveling typically just use my work laptop). My desktop doesn't act as a persitent server but I fire up llama.cpp on it all time for quick chat sessions.

If you have one of the above devices and can dedicate it as server there are additional layers of tooling you can use that dramatically improve the experience. In particular Open WebUI allows you to add tons of useful tools (image gen, web search, code eval, etc), and agent harnesses like Hermes can make the current gen small models very capable. I have an agent in chat on my phone that basically handles all the sys-admin for the server it runs on.

▲

hn_acc1 8 hours ago | parent [-]

What about RTX 3080? Too little VRAM?

	▲	roadside_picnic 8 hours ago \| parent [-]
		In addition to models getting better, the quantization methods have also got much better. If you already have an RTX 3080 it's absolutely worth the time to just mess around and see how it does, experiment with different quants that fit in your VRAM. If you're purchasing I would recommend coughing up the extra cash for the 3090. If you are experimenting it's worth mentioning that the harness/tooling is very important to getting a solid experience. Herme's agent is great for running helpful agents and OpenWeb UI can get really make the experience feel on par with paid chat interfaced. A reasonable halfway step is to pay for an open model through the provider or open router. You'll get many of the benefits (especially around pricing) without needing to shell out on hardware before deciding if you like the way these models work.

▲

KronisLV 8 hours ago | parent | prev | next [-]

> I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

Presently they trail SOTA by about 6-12 months, not on par (average across everything they do).

DeepSeek V4 Pro with Max reasoning is very affordable even if you pay per-token, this month I pushed about 486 million tokens through it (I will admit that >95% was cache hits, for agentic development pretty typical) and it cost me about 8 USD in total. Meanwhile with Opus or even Sonnet if I had to pay API prices, I would be a more sad camper. That model makes a lot of stupid things though, so not ideal.

Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe

I will still stick with Anthropic but consider downgrading from Max 5x to Pro which will change the monthly expenses from around 108 EUR down to <20 EUR (they have a discount too if you pay for a year up front), and probably get the yearly GLM Pro plan which should decrease my yearly expenses from around 1300 EUR total to about 750 total EUR while still giving me a fairly decent setup.

For the consumer, that is doable and practical.

For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible.

Also have run Qwen3.6 35B A3B on prem and it kinda sucks. Way better than models that size a year ago, but still lags behind Sonnet and also DeepSeek V4 Flash due to the size limits. Plus to even run myself I'd need a pretty beefy setup, most likely a pair of Intel Arc Pro B70s with 32 GB of VRAM each that I could still run off of my PSU but the actual model output would be kinda bullshit and I'd have to spend an unpleasant amount of time fixing it.

▲

hatthew 9 hours ago | parent | prev | next [-]

I'm also curious, specifically about the cost of training vs inference, and comparing that to other industries that can have high R&D costs. My instinct says that open weights aren't feasible because of the obvious issue where there is no incentive to develop your own model rather than just taking someone else's model. However, I could see a scenario where a hardware company designs a model that is open weights but optimized strongly for their own proprietary hardware, cutting their costs of inference low enough to be competitive with a hypothetical other company that doesn't have any R&D expediture.

▲

sosodev 9 hours ago | parent | prev | next [-]

It depends entirely on what you want to do and think is feasible. Small models can almost certainly run on the computer that you already have. They can do good tool calling.

▲

epolanski 9 hours ago | parent | prev | next [-]

Yes they are you can use Qwen, DS4 Pro and GLM 5.2 if you have the hardware to do so.

They are not SOTA in various ways but they have better economics.

▲

waffletower 9 hours ago | parent | prev | next [-]

If attractive, cloud providers could develop open models with their own investment, and sell hosted access as a business model. While Google checks these boxes, I haven't seen a Google much marketing focus upon their open models (Gemma) coupled with hosting. groq could conceivably train its own models, but groq's business model hosts open models (GPT OSS, Qwen 3, Llama 4 are currently their prominently advertised models on their site... which seems out of date to me) trained by other organizations.

▲

andrewstuart2 9 hours ago | parent | prev [-]

I hope/wonder if it will go the way computers did. We may learn to more effectively build RAM or parallel compute, and use it more effectively, in the coming decade in such a way that we can democratize more and more like we did with processors to the point that they're ubiquitous.

▲

ai-x 9 hours ago | parent | prev | next [-]

I'm happy to give my identity to Anthropic and crush my competition with irrational fear about privacy and personal data. This is a serious competitive advantage and a moat.

▲

chinathrow 9 hours ago | parent | next [-]

Is this satire? I really can't tell.

▲

card_zero 9 hours ago | parent [-]

Bragging about a strategy isn't very strategic. So the comment's purpose is something else.

	▲	ai-x 7 hours ago \| parent [-]
		Warren Buffett brags about his strategy. Jeff Bezos brags about his strategy. The reason they can brag is, even if it's simple, competition doesn't have the culture to copy/follow it. (My post is literally downvoted) Losing privacy has ZERO downsides for ordinary people. Nobody cares about your data. Literally, put all your life on a YouTube channel and see how many views that Video will get. ZERO. Irrational fears (especially if it's conspiratorial) => Sub-optimal decision. Just like Buffett, Bezos, my strategy is simple -- go against firms that are making irrational decision. It's the same framework to adopt cloud, AI and many frontier technologies and disrupt

▲

sevenzero 9 hours ago | parent | prev [-]

>This is a serious competitive advantage

Given they have laughable uptime and I have yet to find a useful project mostly written by claude... I doubt it.

▲

johndhi 9 hours ago | parent [-]

Huh? Limited uptime means you can't write projects with it? I assume downtime means you can't host on it ...

	▲	sevenzero 9 hours ago \| parent [-]
		I wont buy expensive hardware to self host a model thats outdated within 2 months. Also, yeah, uptime is important if you dont self host.

▲

baq 9 hours ago | parent | prev [-]

More signal this won’t happen without some serious social unrest, not garden variety Jan 6 events… and the window is closing rapidly - when this tech gets sufficiently advanced there won’t be a place to hide.

	▲	9 hours ago \| parent [-]
		[deleted]