Every AI subscription is a ticking time bomb for the frontier provider; within a few years we will be running local models as good as today’s frontier models with almost no cost burden. The floor will fall out of the enterprise market for all the frontier companies.

▲

crazygringo 3 hours ago | parent | next [-]

> within a few years we will be running local models as good as today’s frontier models with almost no cost burden

Based on what? The RAM requirements alone are extraordinary.

No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

▲

crystal_revenge an hour ago | parent | next [-]

> Based on what?

I take it you haven’t actually run any of the current gen local models?

They all fit on fairly accessibility hardware, and their performance is at least on par with what I was paying for last year.

I have one of my agents running entirely from a local model running on a MBP and it has repeatedly shown it’s capable of non-trivial tasks.

Playing around with another, uncensored, local model on my 4090 desktop has me finally thinking about canceling my personal Anthropic subscription. Fully private, uncensored chat is a game changer.

For work it’s still all private models but largely because, at this stage, it’s worth paying a premium just to be sure you’re using the best and it saves the time of managing out own physical servers. But if we got news tomorrow that Anthropic and OpenAI were shutting down, a reasonable setup could be figured out pretty quickly.

▲

Leynos 39 minutes ago | parent [-]

What kind of useful context window are you getting on a 4090, out of curiosity?

	▲	crystal_revenge 22 minutes ago \| parent [-]
		256k tokens for both the MBP and the 4090

▲

alsetmusic 3 hours ago | parent | prev | next [-]

Local modals are 6 months to 18 months behind frontier. Even if the performance of a cloud model is faster, it's clear that local is catching up.

▲

alecco 2 hours ago | parent | next [-]

> Local modals are 6 months to 18 months behind frontier.

I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.

Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.

Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113

---

But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.

▲

crystal_revenge 38 minutes ago | parent | next [-]

I’ve worked, for a long time professionally, in the open model space for 3 years and up to 2 months ago I would have agreed with you. But it’s empirically not the case today. These models (combined with a good harness) have dramatically improved in both power and performance.

Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).

The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.

When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.

▲

dools 2 hours ago | parent | prev | next [-]

Kimi k2.6 is about on par with GPT 5.2 so I’d say open weight models are about 6 months behind.

	▲	cbg0 2 hours ago \| parent \| next [-]
		The Q4 quantization requires about 600GB of RAM without context, not exactly consumer hardware friendly.
	▲	janderland 2 hours ago \| parent \| prev [-]
		Has Kimi found a way to vastly reduce the amount of VRAM required without running at 3 tokens per second? That’s the real concern.

▲

tyre 2 hours ago | parent | prev [-]

The Chinese models should stay close on a lag. They’re doing a ton of distillation that, realistically, I’m not sure the American frontiers can stop.

	▲	alecco an hour ago \| parent [-]
		US labs got tough on "adversarial" distillation [1]. I suspect that's one of several reasons why Chinese big labs are lagging again. [0] US AI firms team up in bid to counter Chinese 'distillation' (Apr 7) https://finance.yahoo.com/sectors/technology/articles/us-ai-...

▲

__s 3 hours ago | parent | prev | next [-]

You still need the hardware

I've got a 128GB strix halo staying warm at home, it has nothing on top models with big budget. It's good supplement to low end plans for offloading grunt work / initial triage

▲

manmal 3 hours ago | parent [-]

Have you looked into DwarfStar 4?

▲

__s 2 hours ago | parent [-]

Been away from home for nearly a month, so was mostly going off Qwen 3.5 122b-a10b (Q4?) / Qwen 3.6 35b-a3b (Q8) / Gemma4 31b (Q8)

Thanks for suggestion tho, tool by antirez is always going to pique interest, I'll check it out when I'm finally home again

Tho says Metal / CUDA, so doesn't seem friendly to Linux AMD system

	▲	manmal 40 minutes ago \| parent [-]
		His quant that fits into 128GB looks interesting for Spark DGX as well IMO.

▲

greesil 3 hours ago | parent | prev | next [-]

How do you know this? I'm not trying to attack your statement, I am genuinely curious how anyone knows anything about model performance outside of benchmarks that are already in the training set.

	▲	scragz 3 hours ago \| parent [-]
		using them you kind of get a feeling for skill level and can extrapolate that better than juiced benchmarks.

▲

lukeschlather 3 hours ago | parent | prev | next [-]

It is not getting easier to obtain hardware that can run models which are sufficiently useful to undercut frontier models, if anything the cost of such hardware has gone up by 25% or more just in the past 6 months.

▲

aleqs 2 hours ago | parent [-]

I think hardware prices will come back down once we start seeing more efficiency improvements in models and hardware, and once more people and companies self-host models (which seems to be happening more and more these days). I think the massive infra/hardware expenditures of OpenAI and the like are going to end up unnecessary, leading to hardware price drops.

▲

t-sauer an hour ago | parent [-]

If companies decide to self-host, wouldn't that drive the demand and therefore prices up? Most companies currently do not have the needed infrastructure.

	▲	aleqs 28 minutes ago \| parent [-]
		I think companies will self host (including on rented hardware) even if it's more expensive, and that, along with efficiency improvements, will drop demand for big AI. I think big AI is overspending on hardware/datacenters at the moment.

▲

calvinmorrison 3 hours ago | parent | prev [-]

if that's true - and in 6 or 12 months i can get what i have today, it might not be worth paying anthropic.

▲

nine_k 2 hours ago | parent | prev | next [-]

> shared, dedicated hosted hardware at full utilization

I must say that the largest dedicated hosted hardware providers now, like Amazon or Google, to a large extent do not produce the software they are offering as a hosted solution (like Linux, Postgres, Redis, Python, Node, etc). Similarly I'm not sure if the producers of the frontier models are going to keep their lead as the service providers for the most widely used models. They would need to have quite a bit of an edge above open-weights models.

Also, models are given very sensitive data to process. For large organizations, the shared dedicated hardware may look like a few (dozens of) racks in a datacenter, rented by a particular company and not shared with any other tenants.

▲

dandellion 20 minutes ago | parent | prev | next [-]

> The RAM requirements alone are extraordinary.

At the same time, $100 a month is A LOT of RAM.

▲

harrall 2 hours ago | parent | prev | next [-]

You can now buy 128 GB unified memory computers from AMD as commodity.

They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.

I’m not sure “it’s only a few years away” but we are sure moving there fast.

▲

nine_k 2 hours ago | parent | next [-]

> first airplane to jet engines in 27 years.

Nitpick: more like 36 years, from Wright Flyer in 1903 to Heinkel 178 in 1939. Still quite impressive.

▲

Traubenfuchs 2 hours ago | parent | prev [-]

I believe the same thing but keep repeating the question: Then what are all the datacenters for?

▲

moregrist 2 hours ago | parent | next [-]

Non-cynically: the frontier providers have a projection for demand.

Cynically: it’s become an executive-level gpu measuring contest. If you’re not making huge commitments on data centers, you can’t be a serious player.

Realistically: It’s a mix of the two. The recent Claude caps for agentic usage suggest that demand exceeded their immediate compute supply. That they can alleviate it with additional capacity from the existing and small-ish xAI facility suggests that either demand may not be rising quite as fast as anticipated, that they’re okay in the short term until more capacity comes online, or a mix of both.

Open questions:

1. At what price point does demand fall, and are the frontier providers overall profitable before that price point?

2. At what price/performance point do on-prem local models make more sense than cloud models?

▲

harrall 2 hours ago | parent | prev | next [-]

I print documents and photos at home regularly but I still contract out to dedicated print shops.

The print shop can’t replicate the practicality of local printing and I can’t replicate their scale of investment. Both coexist perfectly.

▲

nnoremap 2 hours ago | parent [-]

Print-outs are a physical good. Tokens aren't.

	▲	bluGill an hour ago \| parent [-]
		They are both fungible. You can replace one with the other.

▲

chris_money202 2 hours ago | parent | prev [-]

Agents

▲

simooooo 29 minutes ago | parent | prev | next [-]

Qwen 3.6 is virtually indistinguishable from Claude on my 5090

▲

iwontberude 2 hours ago | parent | prev | next [-]

I strongly disagree. Humans are so insanely well incentivized here with trillions in market share to make localized AI good enough and that’s the only benchmark they need.

▲

SkiFire13 an hour ago | parent [-]

Are they? I don't believe there's that big of a market for local AI. Most people don't care that much, and you'll most likely lose the advertising revenue.

▲

GenerWork 25 minutes ago | parent [-]

>I don't believe there's that big of a market for local AI. Most people don't care that much,

I agree that the market for local AI is basically limited to nerds at this point, but that's because nobody's really explained why local AI is a good thing and also because the vast majority of people need the $20 paid plan at most. How much time and money would it take to get something half as good as OpenAIs products running locally?

	▲	mycall 6 minutes ago \| parent [-]
		It will take another [human] generation before AI is well integrated into everyone's daily lives where people will expect a local model handling things for them. I don't think the killer app has arrived yet (OC is a hint of what is to come).

▲

leptons 2 hours ago | parent | prev [-]

>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.

▲

jazzyjackson 2 hours ago | parent | next [-]

No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?

▲

salawat 2 hours ago | parent | next [-]

Another victim of Goldratt's Theory of Constraints. Some things are more important to optimize for than MATMULs per Watt. What that is I leave as an exercise to the student. May you realize what it is before it is too late.

▲

jazzyjackson an hour ago | parent [-]

Some individuals will choose some $10,000 hardware so they can keep freedom and privacy and that's well and good, my point is just that freedom and privacy is not what wins marketshare, and hence, IMHO, local LLMs are not going to catch up and surpass frontier models like some in this thread like to claim

	▲	esseph an hour ago \| parent [-]
		> freedom and privacy is not what wins marketshare Digital sovereignty laws may mandate/remove access to LLMs of other countries on economic and national security grounds.

▲

esseph an hour ago | parent | prev [-]

Within 5-10 years you're going to see a box like one of those AMD Halo nodes running homes.

They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.

I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.

This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.

Maybe I'm imagining things but this is what I think is coming.

It's the lmm/data heart of the home. A useful digital tool.

	▲	4 minutes ago \| parent [-]
		[deleted]

▲

scheme271 2 hours ago | parent | prev [-]

We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.

	▲	sshumaker an hour ago \| parent [-]
		Are you serious? It’s multiple nodes to run a frontier model (a node is 8x GPUs), and they aren’t running on H100s. You are looking at 32+ GPUs.

▲

adamgordonbell 4 hours ago | parent | prev | next [-]

Or put another way, the frontier models are very quickly deprecating assets, because of the competition in the market.

They have to keep getting better to stay ahead of each other and open weight.

Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.

I'm not sure 'local' will be the end state, as hardware needs are high. But certainly competitive forces tend to push profit margins toward zero.

Extended discussion on this topic:

https://corecursive.com/the-pre-training-wall-and-the-treadm...

	▲	airstrike 4 hours ago \| parent [-]
		Well, it's a timebomb for the companies who get paid per token, so the parent is right and TFA is probably wrong

▲

slashdave 3 hours ago | parent | prev | next [-]

> within a few years we will be running local models as good as today’s frontier models

I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

▲

majormajor 3 hours ago | parent | next [-]

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?

I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.

The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)

▲

iwontberude 2 hours ago | parent [-]

Exactly the right argument. Local LLM doesn’t need to outrun the bear (outperform data centers) it only needs to outrun its friend (total cost of ownership).

	▲	bombcar 2 hours ago \| parent [-]
		[dead]

▲

comfysocks 2 hours ago | parent | prev | next [-]

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.

▲

christopherwxyz 3 hours ago | parent | prev [-]

I would readjust your convictions.

We are only 2-4 years away from consumer grade immutable-weight ASICs.

▲

slashdave 3 hours ago | parent [-]

We are discussing how rapid development has been, and now you want to freeze your model in silicon?

▲

nixon_why69 2 hours ago | parent | next [-]

Why not have a bunch of SRAM and various operations like "Q4 matmul" in silicon? Model weights and even architectures could still evolve on a platform like that.

	▲	ac29 2 hours ago \| parent \| next [-]
		Doesnt "a bunch of SRAM" top out at maybe a few gigs per chip (with zero area used for logic)? You'd need an order of magnitude more to fit even a fairly weak general purpose LLM model.
	▲	throwa356262 2 hours ago \| parent \| prev \| next [-]
		I belive that is what NPUs are. The issue is the very huge amount of DRAM and high bandwidth these model require.
	▲	2 hours ago \| parent \| prev [-]
		[deleted]

▲

rogerrogerr 3 hours ago | parent | prev | next [-]

Genuine question from a place of ignorance: what in the silicon pipeline makes it take 2-4years to produce chips with a new model on them? Curious what the process bottleneck is.

	▲	jazzyjackson 2 hours ago \| parent \| next [-]
		Without being an insider, I imagine that most global fab capacity is contracted out several years in advance. You might be interested in the tiny tape out project, which guides you through the process of getting your own design etched on silicon. If you only need larger features and not the next gen single digit nanometer stuff, you may not be so supply constrained. https://tinytapeout.com/
	▲	pjc50 2 hours ago \| parent \| prev [-]
		I think you could get it down to three months between weight changes, if you can encode it in metal layers only. The remaining limits are the fab lead time, and the cost of a metal respin (hundreds of thousands to millions of dollars depending on process).

▲

dangus 2 hours ago | parent | prev [-]

If the silicon costs $200-300 and the company throws it away every two years that’s cheaper than a subscription.

Also, how many companies will just buy an M6/M7 MacBook Pro with 32GB+ of RAM in a couple of years and get “free” AI along with the workstation they were going to buy anyway?

▲

stingraycharles 3 hours ago | parent | prev | next [-]

The economics of local AI just doesn’t make sense. A model like Opus is - supposedly - something like 5T parameters, which is likely something like 3TB of GPU memory.

Local models never reach the % utilization that cloud providers have (80%+), and they’re always going to be much better than local models for this reason.

▲

lumost 3 hours ago | parent | next [-]

Capex, opex, quality, and volume are tricky things to balance. On balance, pc/mobile are cheaper to operate than equivalent cloud and on prem deployments.

It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware.

▲

jazzyjackson 2 hours ago | parent [-]

I haven't been following anyone baking models into ASICs, is it not still necessary to pack just as many transistors onto a chip, whether it's an NPU or GPU, ASIC or not you still need to hold hundreds of gigabytes in memory, so how is it cheaper to bake it onto custom silicon than running it on commodity VRAM? (Asking because I don't know!)

▲

lumost an hour ago | parent [-]

Not my area either! But my understanding is that there are more efficient methods of representing static numbers when you can skip the vram lookup.

https://taalas.com/

Is an example startup in this area claiming 16k tok/s on an asic for llama 8b. Qwen has a 27b model at opus 4.5 quality.

	▲	jazzyjackson 30 minutes ago \| parent [-]
		Neat, thanks for the link

▲

majormajor 2 hours ago | parent | prev [-]

Running local applications is less efficient than thin clients to the cloud generally, not just in LLMs. The trick is that you can get to the point where it's effective enough, and affordable enough, that the control and availability factors become dominant.

▲

stingraycharles 2 hours ago | parent [-]

My point is that you will always get much more value / $ by using cloud based solutions.

	▲	sroerick 22 minutes ago \| parent \| next [-]
		I don't know that this is true. The cloud companies are making money, and inferrence is kind of just "hosting an inferrence server and trying to keep it humming 24/7" But in many cases self hosted or dedicated boxes are cheaper than cloud.
	▲	majormajor 2 hours ago \| parent \| prev [-]
		I just don't see how that's different from getting more value by giving all your employees the most stripped-down chromebook-type devices and running everything else in the cloud, than by giving them "proper" laptops with local apps. It's a measure of a very thin sort of "value/$" that excludes a lot of other things that could be of value to a business, like control, predictability, and availability. Thin clients have been going away for a long time. The trend has been to continue to push higher levels of compute into ever-smaller and ever-more-portable devices.

▲

vb-8448 3 hours ago | parent | prev | next [-]

> within a few years we will be running local models as good as today’s frontier

Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.

▲

evgen 3 hours ago | parent | next [-]

I can run qwen3.6-27b on a four year-old Macbook Pro that dominates ChatGPT-4o (the frontier model from 2 years ago) and is competetitve against early ChatGPT-5 versions. We are also getting a lot smarter about using and deploying these local models. Your entire AI stack from two years ago would be absolutely crushed by a todays local LLM models and a high-end local inference system when combined with a good modern coding agent.

	▲	vb-8448 22 minutes ago \| parent [-]
		Today open weights frontier models cannot run locally, unless quantization is used. Deep seek v4 pro require almost 1 TB of RAM in INT4. I hardly doubt there will be consumer grade HW to run it in 2 years either. And deep seek v4 pro is not even close to OAI or anthropic frontier models.

▲

chermi 3 hours ago | parent | prev | next [-]

Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.

	▲	2 hours ago \| parent [-]
		[deleted]

▲

ls612 3 hours ago | parent | prev [-]

As good as today’s frontier. Gemma 4 today is roughly equivalent to the frontier a year and a half ago at gpt 4o tier.

▲

antisthenes 3 hours ago | parent [-]

What's the cheapest PC you can buy today that will comfortably run Gemma 4 and everything else you want it to run at the same time?

And how many tokens would that buy?

▲

ls612 3 hours ago | parent [-]

I run it on my 4 year old MBP and get 10 tok/s. With the RAM shortage buying anything new today is a nightmare but anyone with a reasonably modern Mac could run it at q6 probably. It is mostly a toy as 4o models weren’t really suitable for real work IMO but at least it won’t ever give me a refusal.

▲

jazzyjackson 2 hours ago | parent [-]

At 10toks, are you using it interactively or do you submit a prompt and come back to it later? I always thought it would make sense to just do conversations over email, asynchronously, the model can take all the time it needs and get back to me when it has an answer.

	▲	ls612 an hour ago \| parent [-]
		10 tok/s is around the borderline of interactive being good. I did the math and it is mostly bottlenecked by memory bandwidth, so in the future I can expect to run a similarly sized model on my 4090 once it gets retired from gaming service and get ~25 tok/s which will be very usable.

▲

intothemild 2 hours ago | parent | prev | next [-]

I've spent the last month bringing in a small demo of what the future could be like, running Qwen, Gemma, and Deepseek, behind LiteLLM so we can monitor token usage, and instead of some dumb ass "tokenmaxxing" we're actively trying to get the cost of inference both down, and in-house.

Boss is happy, very happy. We're rolling it out more widely now.

But this is the future.

▲

WarmWash 11 minutes ago | parent | prev | next [-]

Linux in year 2000 vibes...still waiting to get off windows 26 years later

▲

nijave 4 hours ago | parent | prev | next [-]

>within a few years

Eventually, we'll see. Frontier models still need some pretty serious hardware which will slowly come down in cost. Smaller models are becoming more capable, which will presumably continue to improve.

I think there's still a pretty big gap, though. Claude estimates Opus 4.6 and GLM-5 need about 1.5Ti VRAM. It puts gpt-5.5 around 3-6Ti of VRAM.

That's 8x Nvidia H200 @ ~$30k USD each. Still need some big efficiency improvements and big hardware cost reduction.

	▲	snovv_crash an hour ago \| parent \| next [-]
		Qwen 3.6 27b is somewhere around Opus 4. It runs on a 5090, a $2k desktop GPU, at reasonable speeds.
	▲	throw1234567891 3 hours ago \| parent \| prev [-]
		Or a single mlx cluster if one can find second hand machines somewhere. Difficult to get your hands on today, certainly, but not impossible.

▲

aleqs 2 hours ago | parent | prev | next [-]

Hard agree - the benefits of local/self-hosted models are not just hardware/cost (it might be more expensive at the moment), but what you get in exchange is unnerfed/unstupified models, full cost/usage transparency, optimized/specialized models, privacy/security, etc.

▲

planb 3 hours ago | parent | prev | next [-]

If that’s true, then it will be even cheaper to provide them as a subscription. Following your logic, every company would be running their own data centers instead of using cloud providers.

▲

adrithmetiqa 2 hours ago | parent | prev | next [-]

I disagree. No one will want to use second rate models when the frontier models reach a specific level of capability. Enterprise will keep paying.

▲

malfist 2 hours ago | parent [-]

No one? When free means I get 95% of the capabilities of something very very expensive, you bet your bottom dollar many many people will choose free.

	▲	xboxnolifes 19 minutes ago \| parent [-]
		But its not free.

▲

jmount 2 hours ago | parent | prev | next [-]

I think this is a good under-represented point. Again and again things that could only run on a mainframe get ported to the personal device level. However it looks like the campaign to eliminate the PC (by pre-buying all RAM) is the counter-stroke.

▲

wolttam 3 hours ago | parent | prev | next [-]

There's still going to be plenty of use-case and demand for frontier models running across hundreds or thousands of GPUs. It's just not going to be in the current shape - certainly not accessed by the general public for rote business tasks.

▲

YesBox 3 hours ago | parent | prev | next [-]

You'd have a point if Cloud ^tm didnt take off into a multi billion dollar industry.

▲

himata4113 2 hours ago | parent | prev | next [-]

This is wrong because local models are very expensive, just as expensive as the frontier.

It would cost me $300 in normal deepseek v4 pricing (non discounted) PER DAY, but I get it all for $500 worth of subscriptions.

	▲	nozzlegear 14 minutes ago \| parent [-]
		Why are you paying $300/day to run a local model? The whole point is that you run them on a machine you already own.

▲

otterley 3 hours ago | parent | prev | next [-]

People who are this certain of their predictions should be forced to put real money on them on Kalshi or Polymarket instead of drive-by blowharding on HN.

	▲	whackernews 2 hours ago \| parent \| next [-]
		Oooh. You’re hard.
	▲	watwut 2 hours ago \| parent \| prev [-]
		Meh, having opinioms should imply necessity to gamble on gambling site. Not even when that site calls itself "market" to create plausible deniality.

▲

guesswho_ 4 hours ago | parent | prev [-]

[dead]