Remix.run Logo
airstrike 3 hours ago

This seems to be an attempt to compete with people running local models on Apple hardware—even though those local Mac Mini setups aren't really powerful.

I expect we'll get there in a few years, so perhaps this is Nvidia taking an early step in that direction.

In that case, this goes against Anthropic and OpenAI's business models. Which is a double whammy after Jensen Huang's recent comment about how agentic coding will only increase demand for software engineers, not reduce it.

So it also feels like a part of a budding shift in the competitive tension between the various parts of the AI supply chain.

thewebguyd 2 hours ago | parent | next [-]

Local AI was/is bound to happen, eventually. It'd be smart of Nvidia to get ahead of it.

Non-techy consumers may never do it, but at some point businesses are going to start asking when do they stop paying per token and start running models themselves. Right now the hardware is cost prohibitive, but I doubt that'll always be the case. Eventually the hardware will get cheaper and more available, and Nvidia seems to be betting on that.

They don't care where inference happens, so long as it happens on Nvidia hardware.

h14h 2 hours ago | parent | next [-]

IMO it's only a matter of time before "self-hosting local AI" is as complicated as installing an app and clicking a download button.

And when that happens, the pitch to non-techy users is "Free ChatGPT you can use offline with zero privacy risk". Once hardware accessibility and LLM efficiency advance to the point that this becomes feasible, I suspect it'll result in a much bigger hit to the cloud AI market than many expect.

ribosometronome 16 minutes ago | parent | next [-]

That workflow has been around for awhile now. I'm sure there are others but LM Studio has a model browser in app that effectively simplifies things to hitting download and hitting launch. The complexity tends to be in that there's a lot of models to choose from and also knowing how to set up whatever tool you're using with a local model. None of it's particularly hard, unless you start trying to customize settings.

I think the bigger hang up is that they're still slower and less capable than the frontier models, especially at the hardware specs most home users are likely to have.

adamrezich 2 hours ago | parent | prev [-]

Why is it only a matter of time? The AI-as-a-service companies are going to continue to improve their products by improving both the part that could be reproduced in a self-hosted setup, but also the “secret sauce” they put on top of that to make it a better product. There is no incentive for this “secret sauce” to be something that can be reproduced for self-hosting, is there?

h14h an hour ago | parent | next [-]

I think a major incentive could be to sell hardware. If Apple is able to get their hands on a local LLM capable of covering a significant % of what people use ChatGPT for, the pitch they can offer is:

"Free, private, offline ChatGPT so long as your laptop has X GB of RAM"

Beyond that, I wouldn't underestimate the incentive of "because I can". The "secret sauce" you refer to is effectively just a DB & a while loop that feeds text to a bunch of tensors. If an indie dev decides they want to release something that dismantles the OpenAI & Anthropic moats, there really isn't all that big of a technical barrier stopping them.

bigyabai an hour ago | parent [-]

LLM inference decode is heavily dependent on memory speed, not just having lots of memory. You can't say "X amount of ram" because the memory bandwidth on an M1 is 68.3 GB/s versus the 614 GB/s of an M5 Max, or a 4090's 1.01 TB/s over GDDR6X.

This basically creates a bottleneck at the oldest/cheapest Apple Silicon machines, which are already crippled for context prefill.

h14h 35 minutes ago | parent [-]

Thanks for clarifying -- I was oversimplifying.

But honestly, obsoleting a huge number of otherwise great Apple Silicon machines is something Apple would moment consider a major "pro" of building a compelling local AI stack.

With how much speculation around the difficult time Apple has had getting people to upgrade from M1, I'm sure they'd jump at such an opportunity.

bijowo1676 12 minutes ago | parent [-]

this might be a way for Apple to milk product revenue for many years.

- Please buy our new Macbook pro M5 that gives you 20 tokens/s on local 80B LLM

next year - Please buy our new Macbook pro M6 that gives you 25 tokens/s on local 80B LLM

milking product revenue in perpetuity by offering meaningful marginal improvements, while keeping same architecture will be the golden goose for Apple

+plus if it allows to segment market by wallet size into poor/middle/rich classes, thats even better

thewebguyd an hour ago | parent | prev [-]

What secret sauce? We already have open source tooling for tool use, web browsing, and code execution/computer use. Open weight models will win in the end.

AIaaS might keep an edge with multi-modal agentic workflows, but for 80% of general use cases, no "secret sauce" needed, the open weight models are already there, and tooling is constantly getting better.

The bottleneck is the cost of local hardware right now.

artyom 2 hours ago | parent | prev | next [-]

I'm from the times when you had to purchase a separate chip to perform floating point math. It was called a math co-processor. [1]

After a few generations (and over a decade) that was indistinguishable from the CPU chip itself.

It's a long hyperbole, I know, but I think local inference is inevitable; and the big fishes know it.

Will that be a complex technical setup? An appliance? An additional chip in your motherboard? So transparent it's burned right into the CPU? Those are just implementation details. We're probably just one generational breakthrough away from it.

[1] https://en.wikipedia.org/wiki/X87

postalrat an hour ago | parent [-]

Like the math co-processor it might end up just being new instructions for the cpu to handle ai related math.

smrtinsert an hour ago | parent | prev [-]

> Non-techy consumers may never do it

They will. As some point in the future, people will want everything, they'll prompt full movies because they're bored and want to watch something.

hdgvhicv 2 minutes ago | parent [-]

You’re assuming that owning compute will be possible.

c7b 41 minutes ago | parent | prev | next [-]

It's not even anything new, it's basically the mobile version of the DGX Spark. The two chips (N1X/GB10) are pretty similar in terms of architecture and specs. I don't get why this seems to be getting so much attention now.

But I like it. It's a copy of Apple's SoC design philosophy, same as AMD's Strix Halo, which I always thought was really cool both for laptops and home PCs. NVidia's traditional consumer cards pull way too much power and are too noisy to comfortably put them in a living or office environment.

h14h 2 hours ago | parent | prev | next [-]

One can only hope.

That said, Apple's vertical integration is a massive competitive advantage here, IMO. Nvidia's reliance on Microsoft & Windows for software support likely makes competing w/ Apple an uphill battle.

If/when Local AI gets good enough to compete with Cloud AI on most inference workloads, Apple starts to look like Nvidia's biggest competitor.

While this is admittedly a dream scenario, the biggest downside would be Apple effectively having a monopoly in "Agent-ready" consumer electronics. Hopefully local AI both becomes the norm, and there is sufficient competition among the consumer platforms.

Side-note: I would love to see an "RTX Spark" Framework 13 mainboard at some point.

bigyabai an hour ago | parent [-]

I don't understand this stance. Microsoft is reliant on Nvidia, they don't have a good ARM SOC to ship with without them. They will bend over backwards to accommodate these SOCs on Windows, and probably don't have much work to do in the first place.

Apple's vertical integration has led to a Siri overhaul that took half a decade to roll out, and it won't even run locally. They built an NPU coprocessor that's basically dark silicon for expensive inference, and then shipped MLX to stop Tensorflow and Pytorch from replacing Apple's role in the stack entirely. Mac owners are pleading for signed CUDA drivers for the PCIe or Thunderbolt in their $5,000+ Mac Pros. Apple's ecosystem is pure liability for AI, they're not moving any product for datacenter inference and can't even sell the hardware to themselves: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...

Nvidia's profit margins are safe. Even if the RTX Spark is a completely failed product, Apple is not encroaching on the markets that Nvidia dominates.

h14h 41 minutes ago | parent [-]

Fair points all around. Ultimately it all comes down to execution.

In theory, Apple SHOULD have an advantage given they have everything they need in house and can all pull in a unified direction. In practice, it's not always the case that all the teams in a large corporation are all that much better at pulling in the same direction than multiple different corporations in a partnership. And all this will be moot if Local LLMs never catch up to cloud LLMs in terms of quality.

Regardless, it'll be very interesting to see how Nvidia's partnerships with Microsoft & hardware OEMs play out. If the AI inference compute share shifts appreciably to local consumer hardware, I'll want to see strong competition.

bigyabai 16 minutes ago | parent [-]

I'd argue that Apple had the upper hand, but they folded super early. They abandoned OpenCL, which was the most promising CUDA competitor with industry-wide buy in from dozens of companies. Then they transitioned to an ecosystem-first mindset prevented Apple from cooperating to take down Nvidia, and their locked-down software stopped the industry's first high-speed ARM servers from reaching their audience. Nvidia capitalized on both opportunities to the tune of trillions in valuation.

Without Khronos involved, I don't think that Apple has the buy-in to create a real industry-scale CUDA alternative. At this point, it might just be most profitable to support CUDA in macOS and give the people what they want.

bityard an hour ago | parent | prev | next [-]

I don't believe Anthropic and OpenAI are any more fearful of local AI than Google or Microsoft are of people hosting their own email.

Local AI capabilities are growing at a rapid pace, but so is hosted AI. While you can do a surprising amount of useful work with a model occupying a few to a few hundred gigs of VRAM, the hosted models are going to be way ahead for a long time.

grahamburger an hour ago | parent | prev | next [-]

You can do a lot with existing devices in a medium to decent gaming PC (or probably phone/laptop, I haven't tried.) I think HN tends to skew toward only thinking of LLM as useful for coding, but they are very useful for many non-coding things, and existing local LLMs are quite capable. I imagine it won't be long before apps with LLM-based features will try to run locally first and fall back to cloud LLMs just to save token costs. Actually I'd be surprised if some apps aren't doing this already.

spamizbad an hour ago | parent | prev | next [-]

Might be aimed at people who spec out the $5100 Macbook Pros with M5 Maxes and 128GB.

spullara an hour ago | parent [-]

definitely! it has the advantage that it can run CUDA kernels but on the other hand it has lower memory bandwidth and probably loses a token/s fight for many LLMs.

3 hours ago | parent | prev | next [-]
[deleted]
mschuster91 41 minutes ago | parent | prev [-]

> In that case, this goes against Anthropic and OpenAI's business models. Which is a double whammy after Jensen Huang's recent comment about how agentic coding will only increase demand for software engineers, not reduce it.

The writing is on the wall, neither Anthropic nor OpenAI are anywhere near close to sustainability and if one or, worse, both fail the entire demand bubble for NVDA crashes.

It's smart to set up alternative destination markets while they can do so in peace.