"Scaling up performance from M5 and offering the same breakthrough GPU architecture with a Neural Accelerator in each core, M5 Pro and M5 Max deliver up to 4x faster LLM prompt processing than M4 Pro and M4 Max, and up to 8x AI image generation than M1 Pro and M1 Max."

Are they doubling down on local LLMs then?

I still think Apple has a huge opportunity in privacy first LLMs but so far I'm not seeing much execution. Wondering if that will change with the overhaul of Siri this spring.

▲ butILoveLife 9 hours ago | parent | next [-]

I think its just marketing, and the marketing is working. Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

I don't mind it, I open Apple stock. But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

▲

jsheard 9 hours ago | parent | next [-]

> Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS, the only platform which can programmatically interface with iMessage and other Apple ecosystem stuff? It has nothing to do with the hardware really.

Still, buying a brand new Mac Mini for that purpose seems kind of pointless when a used M1 model would achieve the same thing.

▲

ErneX 9 hours ago | parent | next [-]

It’s exactly that. They are buying the base model just for that. You are not going to do much local AI with those 16GB of ram anyway, it could be useful for small things but the main purpose of the Mini is being able to interact with the apple apps and services.

▲

rafaelmn 7 hours ago | parent | next [-]

16GB should be enough for TTS/Voice models running locally no ? I was thinking about having a home assistant setup like that where the voice is local and the brain is API based

	▲	0x457 4 hours ago \| parent \| next [-]
		I run ministral for my home knowledge database on 24G iMac and some other non-agentic LLM things.
	▲	ErneX 5 hours ago \| parent \| prev [-]
		Sure, that’s why I said maybe it’s useful for a few things. But the main reason people were recommending the Mini was for its price (base model) and having access to the Apple services for clawdbot to leverage. Not precisely for local AI.

▲

chaostheory 8 hours ago | parent | prev [-]

No one is buying a base model Mac for local LLM. Everyone is forgetting the PC prices have drastically increased due to RAM and SSD. Meanwhile, Macs had no such price change… at least for the models that didn’t just drop today. Mac’s are just a good deal at the moment.

▲

jsheard 8 hours ago | parent | next [-]

> Meanwhile, Macs had no such price change

Yeah because Mac upgrade prices were already sky high, long before the component shortage. 32GB of DDR5-6000 for a PC rocketed from $100 to $500, while the cost of adding 16GB to a Mac was and still is $400.

	▲	AnthonyMouse 5 hours ago \| parent [-]
		I'm kind of curious how Apple's supply contracts actually work, because it's currently more attractive to buy a Mac with a lot of RAM than it usually is, relative to a PC. So if it's "we negotiated a price and you give us as much RAM as we sell machines" the company supplying the RAM is getting soaked because they're having to supply even more RAM to Apple for a below-market price. But if the contract was for a specific amount of RAM and then people start coming to Apple more for high RAM machines, they're going to exhaust their contract sooner than usual and run out of cheap memory to buy. Then they have to decide if they want to lower their margins or raise the already-high price up to nosebleed levels.

▲

briffle 8 hours ago | parent | prev [-]

the new models cost $200 more for each 8GB of Ram you add.. Ouch...

	▲	Forgeties79 an hour ago \| parent [-]
		That's been the case for years. Not new to the M5's

▲

philistine 9 hours ago | parent | prev | next [-]

There are so few used Mac Mini around, those are all gone and what is left is to buy new.

▲

jermaustin1 8 hours ago | parent | next [-]

Worse than that, they hold their value, so buying a used M1 mini is still a few hundred bucks, and saving $200-300 by purchasing a 5 generation older mini seems like a bad deal in comparison.

▲

teaearlgraycold 5 hours ago | parent [-]

Someone came to be excited they got a "deal" on the newest Intel Mac Mini for hosting OpenClaw. 8GB model for $300. I kind of regret bursting their bubble by telling them you can walk over to Costco (nearest one at time of discussion was walking distance) and pay $550 for one with an M4 and 16GB of RAM.

	▲	Octoth0rpe 3 hours ago \| parent [-]
		Up until a week ago, the base m4 mini (16gb ram/256gb ssd) was $399 at microcenter, now $499. Pretty shocking how good of a value that is IMO.

▲

someperson 8 hours ago | parent | prev [-]

Just like with GPUs and Bitcoin they'll be a flood of old hardware on the market eventually.

▲

BeetleB 8 hours ago | parent | prev | next [-]

Can't they simply run MacOS on a VM on existing Mac hardware?

▲

shuckles 8 hours ago | parent | next [-]

You aren’t going to run a network connected 24/7 online agent from a laptop because it’s battery powered and portable.

▲

sneak 7 hours ago | parent | prev [-]

Not if you want it to be able to use the hardware identifiers to register for use with iMessage.

	▲	jen20 an hour ago \| parent [-]
		Not true as of macOS 15 onwards [1]. [1]: https://developer.apple.com/documentation/virtualization/usi...

▲

re-thc 9 hours ago | parent | prev | next [-]

> Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS

That's likely only part of the reason. Mac Mini is now "cheap" because everyone exploded in price. RAM and SSD etc have all gone up massively. Not the mention Mac mini is easy out of the box experience.

▲

CrazyStat 8 hours ago | parent | next [-]

It's not cheap, though. Two weeks ago I bought a computer with a similar form factor (GMKtec G10). Worse CPU and GPU but same 16GB memory and a larger SSD for 40% the price of a base mac mini ($239 vs $599). It came with Windows preinstalled, but I immediately wiped that to install linux. Even a used (M-series) mac mini is substantially more expensive. It will cost me about an extra penny per day in electricity costs over a mac mini, but I won't be alive long enough for the mac mini to catch up on that metric.

I considered the mac mini at the time, but the mac mini only makes sense if you need the local processing power or the apple ecosystem integration. It's certainly not cheaper if you just need a small box to make API calls and do minimal local processing.

▲

stanmancan 8 hours ago | parent | next [-]

It's cheap for what you get.

If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.

All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.

	▲	kllrnohj 6 hours ago \| parent \| next [-]
		> you an also just buy a RPI for a fraction of the price of the GMKtec G10. Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135. A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300
	▲	edm0nd 3 hours ago \| parent \| prev [-]
		>you an also just buy a RPI for a fraction of the price lol. you need to look at rpi 5 prices again. they are insane.

▲

nicoburns 8 hours ago | parent | prev | next [-]

If you need the CPU power in the Mac Mini then it is a pretty good price-to-performance ratio.

▲

re-thc 8 hours ago | parent | prev [-]

> It came with Windows preinstalled, but I immediately wiped that to install linux.

Do you really need Openclaw now? And not claude code + zapier or Claude code + cron?

That's the point. If you have worse CPU and GPU Windows will be sluggish (it's bloated).

▲

9 hours ago | parent | prev [-]

[deleted]

▲

renewiltord 5 hours ago | parent | prev | next [-]

Bro. The used M1 mini and studio are all gone. I was thinking of buying one for local AI before openclaw came out and went back to look and the order book is near empty. Swappa is cleared out. eBay is to the point that the m1 studio is selling for at least a thousand more.

This arb you’re talking about doesn’t exist. An m1 studio with 64 gb was $1300 prior to openclaw. You’re not getting that today.

I would have preferred that too since I could Asahi it later. It’s just not cheap any more. The m4 is flat $500 at microcenter.

▲

llmslave 9 hours ago | parent | prev [-]

yes, and its funny that all these critical people dont know this

▲

rafram 9 hours ago | parent | prev | next [-]

Why not? The integrated GPUs are quite powerful, and having access to 32+ GB of GPU memory is amazing. There's a reason people buy Macs for local LLM work. Nothing else on the market really beats it right now.

▲

mleo 9 hours ago | parent | prev | next [-]

My M4 MacBook Pro for work just came a few weeks ago with 128 GB of RAM. Some simple voice customization started using 90GB. The unified memory value is there.

▲

lizknope 8 hours ago | parent | prev | next [-]

Jeff Geerling had a video of using 4 Mac Studios each with 512GB RAM connected by Thunderbolt. Each machine is around $10K so this isn't cheap but the performance is impressive.

https://www.youtube.com/watch?v=x4_RsUxRjKU

▲

Greed 8 hours ago | parent [-]

If 40k is the barrier to entry for impressive, that doesn't really sell the usecase of local LLMs very well.

For the same price in API calls, you could fund AI driven development across a small team for quite a long while.

Whether that remains the case once those models are no longer subsidized, TBD. But as of today the comparison isn't even close.

▲

jazzyjackson 7 hours ago | parent | next [-]

It’s what a small business might have paid for an onprem web server a couple of decades ago before clouds caught on. I figure if a legal or medical practice saw value in LLMs it wouldn’t be a big deal to shove 50k into a closet

▲

Greed 5 hours ago | parent [-]

You would still have to do some pretty outstanding volume before that makes sense over choosing the "Enterprise" plan from OpenAI or Anthropic if data retention is the motivation.

Assuming, of course, that your legal team signs off on their assurance not to train on or store your data with said Enterprise plans.

	▲	LunaSea 4 hours ago \| parent [-]
		At least with the server you know what you are buying. With Anthropic you're paying for "more tokens than the free plan" which has no meaning

▲

ttoinou 8 hours ago | parent | prev | next [-]

With M3 Max with 64GB of unified ram you can code with a local LLM, so the bar is much lower

▲

Greed 5 hours ago | parent [-]

But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day.

To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best.

▲

0x457 4 hours ago | parent | next [-]

I started doing it to hedge myself for inevitable disappearance of cheap inference.

▲

robotresearcher 3 hours ago | parent | prev | next [-]

Sometimes you can't push your working data to third party service, by law, by contract, or by preference.

▲

nurettin 3 hours ago | parent | prev [-]

I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second.

It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done.

Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs.

But you're right, I see no reason to spend right now.

	▲	Greed 36 minutes ago \| parent [-]
		Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?

▲

spacedcowboy 5 hours ago | parent | prev [-]

It's not. I've got a single one of those 512GB machines and it's pretty damn impressive for a local model.

	▲	Greed 4 hours ago \| parent [-]
		Assuming you ran the gamut up from what you could fit on 32 or 64GB previously, how noticeable is the difference between models you can run on that vs. the 512GB you have now? I've been working my way up from a 3090 system and I've been surprised by how underwhelming even the finetunes are for complex coding tasks, once you've worked with Opus. Does it get better? As in, noticeably and not just "hallucinates a few minutes later than usual"?

▲

tcmart14 6 hours ago | parent | prev | next [-]

I'm not really into AI and LLMs. I personally don't like anything they output. But the people I know who are into it and into running their own local setups are buying Studios and Minis for their at home local LLM set ups. Really, everyone I personally know who is doing their build your own with local LLMs are doing this. I don't know anyone anymore buying other computers and NVIDIA graphics cards for it.

▲

0x457 4 hours ago | parent | prev | next [-]

I think people buying those don't realize requirements to run something as big as Opus, they think those gigabytes of memory on Mac studio/mini is a lot only to find out that its "meh" on context of LLMs. Plus most buy it as a gateway into Apple ecosystem for their Claws, iMessage for example.

> But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

But it is Unified Memory? Thanks to Intel iGPU term is tainted for a long time.

▲

threatofrain 9 hours ago | parent | prev | next [-]

The biggest problem with personal ML workflows on Mac right now is the software.

▲

cmdrmac 8 hours ago | parent [-]

I'm curious to know what software you're referring to.

	▲	csullivannet 5 hours ago \| parent [-]
		Yes

▲

Hamuko 9 hours ago | parent | prev [-]

I've tried to use a local LLM on an M4 Pro machine and it's quite painful. Not surprised that people into LLMs would pay for tokens instead of trying to force their poor MacBooks to do it.

▲

atwrk 9 hours ago | parent | next [-]

Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.

▲

usagisushi 6 hours ago | parent | prev | next [-]

Qwen 3.5 35B-A3B and 27B have changed the game for me. I expect we'll see something comparable to Sonnet 4.6 running locally sometime this year.

	▲	OtomotO a few seconds ago \| parent \| next [-]
		This would be an absolute game changer for me. I am dictating this text now on a local model and I think this is the way to go. I want to have everything locally. I'm not opposed to AI in general or LLMs in general, but I think that sending everything over the pond is a no-go. And even if it were European, I still wouldn't want to send everything to some data center and so on. So I think this is a good, it would be a good development and I think I would even buy an Apple device for the first time since the iPod just for that.
	▲	prettyblocks 2 hours ago \| parent \| prev [-]
		Could be, but it likely won't be able to support the massive context window required for performance on par with sonnet 4.6

▲

freeone3000 9 hours ago | parent | prev | next [-]

I’m super happy with it for embedding, image recog, and semantic video segmentation tasks.

▲

giancarlostoro 9 hours ago | parent | prev | next [-]

What are the other specs and how's your setup look? You need a minimum of 24GB of RAM for it to run 16GB or less models.

▲

jazzyjackson 7 hours ago | parent | next [-]

Tokens per second is abysmal no matter how much ram you have

	▲	giancarlostoro 5 hours ago \| parent [-]
		Some models run worse than others but I have gotten reasonable performance on my M4 Pro with 24 GB of RAM

▲

SV_BubbleTime 9 hours ago | parent | prev | next [-]

This is typically true.

And while it is stupid slow, you can run models of hard drive or swap space. You wouldn’t do it normally, but it can be done to check an answer in one model versus another.

▲

Hamuko 8 hours ago | parent | prev [-]

48 GB MacBook Pro. All of the models I've tried have been slow and also offered terrible results.

	▲	giancarlostoro 5 hours ago \| parent [-]
		Try a software called TG Pro lets you override fan settings, Apple likes to let your Mac burn in an inferno before the fans kick in. It gives me more consistent throughput. I have less RAM than you and I can run some smaller models just fine, with reasonable performance. GPT20b was one.

▲

andoando 8 hours ago | parent | prev [-]

Local LLMs are useful for stuff like tool calling

	▲	renewiltord 5 hours ago \| parent [-]
		What models are you using? I’ve found that SOTA Claudes outperform even gpt-5.2 so hard on this that it’s cheaper to just use Sonnet because num output tokens to solve problem is so much lower that TCO is lower. I’m in SF where home power is 54¢/kWh. Sonnet is so fast too. GPT-5.2 needs reasoning tuned up to get tool calling reliable and Qwen3 Coder Next wasn’t close. I haven’t tried Qwen3.5-A3B. Hearing rave reviews though. If you’re using successfully some model knowing that alone is very helpful to me.

▲ whizzter 9 hours ago | parent | prev | next [-]

We had a workshop 6 months ago and while I've always been sceptical of OpenAI,etc's silly AGI/ASI claims, the investments have shown the way to a lot of new technology and has opened up a genie that won't be put back into the bottle.

Now extrapolating in line with how Sun servers around year 2000 cost a fortune and can be emulated by a 5$ VPS today, Apple is seeing that they can maybe grab the local LLM workloads if they act now with their integrated chip development.

But to grab that, they need developers to rely less on CUDA via Python or have other proper hardware support for those environments, and that won't happen without the hardware being there first and the machines being able to be built with enough memory (refreshing to see Apple support 128gb even if it'll probably bleed you dry).

▲

fny 9 hours ago | parent | next [-]

I feel like the push by devs towards Metal compatibility has been 10x than AMD. I assume that's because the majority of us run MacBooks.

▲

well_ackshually 7 hours ago | parent | next [-]

The only "push" towards Metal compatibility there's been has been complaints on github issues. Not only has none of the work been done, absolutely nobody in their right mind wants to work on Metal compatibility. Replacing proprietary with proprietary is absolutely nobody's weekend project. or paid project.

	▲	hnb2137 2 hours ago \| parent [-]
		If coding by AI was truly solved then it would be done with AI, right?

▲

whizzter 8 hours ago | parent | prev | next [-]

I think that might be partly because on regular PC's you can just go and buy an NVidia card insteaf of fuzzing around with software issues, and for those on laptops they probably hope that something like Zluda will solve it via software shims or MS backed ML api's.

Basically, too many choices to "focus on" makes non a winner except the incumbent.

▲

pjmlp 7 hours ago | parent | prev | next [-]

Which majority?

I certainly only use Macs when being project assigned, then there are plenty of developers out there whose job has nothing to do with what Apple offers.

Also while Metal is a very cool API, I rather play with Vulkan, CUDA and DirectX, as do the large majority of game developers.

▲

whizzter 7 hours ago | parent [-]

Honestly though, gamedevs really are among the biggest Windows stalwarts due to SDK's and older 3d software.

Only groups of developers more tied to Windows that I can think of are probably embedded people tied due to weird hardware SDK's and Windows Active Directory dependent enterprise people.

Outside of that almost everyone hip seems to want a Mac.

	▲	pjmlp 7 hours ago \| parent \| next [-]
		80% of the desktop market has to have their applications developed by someone, at least until software replicators replace them. Everyone hip alright, or at least those that would dream to earn a salary big enough to afford Apple taxes. Remember there are world regions where developers barely make 1 000 euros per month.
	▲	fishcrackers 33 minutes ago \| parent \| prev [-]
		[dead]

▲

davidmurdoch 9 hours ago | parent | prev [-]

Who is "us" in this case? Majority of devs that took the stack overflow survey use Windows:

https://survey.stackoverflow.co/2025/technology/#1-computer-...

▲

AdamN 9 hours ago | parent | next [-]

That's the broad developer community. 90%+ of the engineers at Big Tech and the technorati startups are on MacOS with 5% on Linux and the other 5% on Windows.

▲

davidmurdoch 9 hours ago | parent | next [-]

Source?

▲

re-thc 8 hours ago | parent | prev [-]

> 90%+ of the engineers at Big Tech and the technorati startups

The US 1s? Is that why we have Deepseek and then other non-US open source LLMs catching up rapidly?

World view please. The developer community is not US only.

	▲	seanmcdirmid 8 hours ago \| parent [-]
		You’ll see a lot of MacBooks in Beijing’s zhongguangcun where all the tech companies are, but they also have a lot of students there as well, so who knows. You need to go out to the suburbs where Lenovo has offices to stop seeing them. I know Apple is common in Western Europe having lived there for two years (but that was 20 years ago, I lived in China for 9 years after that). It wouldn’t surprise me if the deepseek people were primarily using Mac’s. Maybe Alibaba might be using PCs? I’m not sure.

▲

pdpi 9 hours ago | parent | prev | next [-]

I think it's reasonable to say that the people responding to surveys on Stack Overflow aren't the same people who work on pushing the state of the art in local LLM deployment. (which doesn't prove that that crowd is Apple-centric, of course)

▲

davidmurdoch 9 hours ago | parent [-]

Perhaps. Though Windows has been the majority share even when stack overflow was at it's peak, and before.

	▲	petercooper 4 hours ago \| parent [-]
		It's not the whole answer, but SO came from the .NET world and focused on it first so it had a disproportionately MS heavy audience for some time. GitHub had the same issue the other way around. Ruby was one of GitHub's top five languages for its first decade for similar reasons.

▲

JCharante 9 hours ago | parent | prev | next [-]

Majority of devs are in the global south I presume

▲

9 hours ago | parent | prev [-]

[deleted]

▲

freeone3000 9 hours ago | parent | prev | next [-]

Torch mlp support on my local macbook outperforms CUDA T4 on Colab.

▲

pjmlp 7 hours ago | parent | prev [-]

Except CUDA feels really cozy, because like Microsoft, NVidia understands the Developers, Developers, Developers mantra.

People always overlook that CUDA is a polyglot ecosystem, the IDE and graphical debugging experience where one can even single step on GPU code, the libraries ecosystem.

And as of last year, NVidia has started to take Python seriously and now with cuTile based JIT, it is possible to write CUDA kernels in pure Python, not having Python generate C++ code that other tools than ingest.

They are getting ahead of Modular, with Python.

▲ Lalabadie 9 hours ago | parent | prev | next [-]

There already are a bunch of task-specific models running on their devices, it makes sense to maintain and build capacity in that area.

I assume they have a moderate bet on on-device SLMs in addition to other ML models, but not much planned for LLMs, which at that scale, might be good as generalists but very poor at guaranteeing success for each specific minute tasks you want done.

In short: 8gb to store tens of very small and fast purpose-specific models is much better than a single 8gb LLM trying to do everything.

	▲	Munachi1869 9 hours ago \| parent \| next [-]
		Probably possible for pure coding models. I see on-device models becoming viable and usable in like 2-3 years on device
	▲	9 hours ago \| parent \| prev [-]
		[deleted]

▲ tiffanyh 9 hours ago | parent | prev | next [-]

> Are they doubling down on local LLMs then?

Apple is in the hardware business.

They want you to buy their hardware.

People using Cloud for compute is essentially competitive to their core business.

	▲	causal 4 hours ago \| parent [-]
		"Doubling down on already being the best hardware for local inference"

▲ woadwarrior01 8 hours ago | parent | prev | next [-]

> Are they doubling down on local LLMs then?

Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.

▲ Sharlin 9 hours ago | parent | prev | next [-]

"Apple Intelligence is even more capable while protecting users’ privacy at every step."

Remains to be seen how capable it actually is. But they're certainly trying to sell the privacy aspect.

	▲	re-thc 8 hours ago \| parent [-]
		> Remains to be seen how capable it actually is. It's the best. We all turned it off. 100% privacy.

▲ caycep 6 hours ago | parent | prev | next [-]

Given all the supply issues w/ Nvidia, I think Apple's AI strategy should be - local AI everything (not just LLMs), but also make Metal competitive w/ CUDA. Their ace in the hole is the unified memory model.

▲ aurareturn 9 hours ago | parent | prev | next [-]

  Are they doubling down on local LLMs then?

Neural Accelerator was present in iPhone 17 and M5 chip already. This is not new for M5 Pro/Max.

Apple's stated AI strategy is local where it can and cloud where it needs. So "doubling down"? Probably not. But it fits in their strategy.

▲ Aurornis 9 hours ago | parent | prev | next [-]

The hardware capabilities that make local LLMs fast are useful for a lot of different AI workloads. Local LLMs are a hot topic right now so that’s what the marketing team is using as an example to make it relatable.

▲ ivankra 9 hours ago | parent | prev | next [-]

But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from?

I think I'll pass on upgrading.

	▲	singhrac 9 hours ago \| parent \| next [-]
		It’s prompt processing so prefill - that’s compute bound not memory.
	▲	0x457 4 hours ago \| parent \| prev [-]
		4x is on Time To First Token it's on the graph.

▲ game_the0ry 9 hours ago | parent | prev | next [-]

> Are they doubling down on local LLMs then?

Honestly, I think that's the move for apple. They do not seem to have any interest in creating a frontier lab/model -- why would they give the capex and how far behind they are.

But open source models (Kimi, Deepseek, Qwen) are getting better and better, and apple makes excellent hardware for local LLMs. How appealing would it be to have your own LLM that knows all your secrets and doesnt serve you ads/slop, versus OpenAI and SCam Altman having all your secrets? I would seriously consider it even if the performance was not quite there. And no need for subscription + cli tool.

I think apple is in the best position to have native AI, versus the competition which end up being edge nodes for the big 4 frontier labs.

	▲	iAMkenough 4 hours ago \| parent [-]
		RE Frontier models/hardware: I'm interested to see what happens with their "private cloud compute" marketing concept now that they're moving from running Siri AI experiences on Apple servers to Google servers instead.

▲ rafark 4 hours ago | parent | prev | next [-]

> Are they doubling down on local LLMs then?

I love the push to local llms. But it’s hilarious how apple a few years ago was so reluctant to even mention “AI” in its keynotes and fast forward a couple years they’ve fully embraced it. I mean I like that they embraced it rather than be “different” (stubborn) and stay behind the tech industry. It’s the smart choice. I just think it’s funny.

▲ Someone1234 9 hours ago | parent | prev | next [-]

Apple's AI strategy really kind of threads the needle cleverly.

"AI" (LLMs) may or may not have a bubble-pop moment, but until it does Apple get to ride it on these press releases and claims. But if the big-pop occurs, then Apple winds up with really fantastic hardware that just happens to be good at AI workloads (as well as general computing).

For example, image classification (e.g. face recognition/photo tagging), ASR+vocoders, image enhancement, OCR, et al, were popular before the current boom, and will likely remain popular after. Even if LLM usage dries up/falls out of vogue, this hardware still offers a significant user benefit.

▲

lamontcg 6 hours ago | parent | next [-]

LLM usage is not very likely to "dry up".

What is more likely to happen though is that it doesn't take multiple $10B of datacenter and capital to build out models--and the performance against LLM benchmarks starts to max out to the point where throwing more capital at it doesn't make enough of a difference to matter.

Once the costs shrink below $1B then Apple could start building their own models with the $139B in cash and marketable securities that they have--while everyone else has burned through $100B trying to be first.

Of course the problem with this strategy right now is that Siri really, really sucks. They do need to come up with some product improvements now so that they don't get completely lapped.

▲

ChrisGreenHeur 9 hours ago | parent | prev [-]

those things could likely just run fine on the gpu though

▲

Someone1234 9 hours ago | parent | next [-]

They could run fine on the CPU too. But these are mobile devices, therefore battery usage is another significant metric. Dedicated hardware is more energy efficient than general hardware, and GPU in particular is a power-hog.

	▲	vel0city 8 hours ago \| parent [-]
		Exactly. It's the same thing as video or audio encoding and decoding. Sure the CPU could do it, potentially use the GPU, but having actual hardware encoders and decoders for the most common codecs will save a lot of energy.

▲

Nevermark 7 hours ago | parent | prev [-]

Not if GPU RAM is a limiter. Which it is for most models.

Unified memory is a serious architectural improvement.

How many GPUs does it take to match the RAM, and make up for the additional communication overhead, of a RAM-maxed Mac? Whatever the answer, it won’t fit in a MacBook Pro’s physical and energy envelopes. Or that of an all-in-one like the Studio.

▲ maherbeg 6 hours ago | parent | prev | next [-]

Honestly, they can keep waiting for another year or two for on-device models at the size they're looking for to be powerful enough.

▲ blueTiger33 7 hours ago | parent | prev | next [-]

have you seen that github repo where they unlock the true power of NE?

	▲	recov 7 hours ago \| parent [-]
		Have a link?

▲ icar 8 hours ago | parent | prev | next [-]

Didn't they announce a partnership with Google Gemini?

▲ jahller 9 hours ago | parent | prev | next [-]

looks like this will be their angle for the whole agentic AI topic

▲ andy_ppp 9 hours ago | parent | prev | next [-]

It is simply marketing nonsense - what they really mean (I think) is they support matrix multiplication (matmul) at the hardware level which given AI is mostly matrix multiplications you'll get much faster inference (and some increase in training too) on this new hardware. I'm looking forward to seeing how fast a local 96gb+ LLM is on the M5 Max with 128gb of RAM.

▲

manmal 6 hours ago | parent [-]

We've already established in this thread that memory bandwidth isn't that much greater than M4 Max - 12%? However, I wonder if batched inference will benefit greatly from the vastly improved compute. My guess is that parallel usage of the same model will be a couple times faster. So, single "threaded" use not that much better, but say you want to run a lot of batch jobs, it'd be way faster?

	▲	andy_ppp 3 hours ago \| parent [-]
		Is this a reply to a different comment?

▲ general_reveal 9 hours ago | parent | prev | next [-]

It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second.

So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel.

Those are the compute needs.

We just need compute everywhere as fast as possible.

▲ kilroy123 9 hours ago | parent | prev | next [-]

I've been so disappointed in Apple's lack of execution on this. There is so much potential for fantastic local models to run and intelligently connect to cloud models.

I just don't get why they're dropping the ball so much on this.

▲

NetMageSCW 8 hours ago | parent [-]

Because it won’t sell enough hardware to matter to them.

They aren’t dropping the ball, they are being smart and prudent.

	▲	kilroy123 7 hours ago \| parent [-]
		Downvote all you want. Point blank, they are dropping the ball.

▲ ignoramous 8 hours ago | parent | prev | next [-]

> doubling down on local LLMs

Do think it'll be common to see pros purchasing expensive PCs approaching £25k or more if they could run SoTA multi-modal LLMs faster & locally.

▲ m3kw9 9 hours ago | parent | prev | next [-]

A useful llm that needs 64gb of ram and mid double digit cores is not useful for 99% of their customers. The LLMs they have on iphone 17's certainly cannot do anything useful other than summerization and stuff. It's a hardware constraint that they have.

▲ jmyeet 9 hours ago | parent | prev | next [-]

Apple absolutely has a massive opportunity here because they used a shared memory architecture.

So as most people in or adjacent to the AI space know, NVidia gatekeeps their best GPUs with the most memory by making them eye-wateringly expensive. It's a form of market segmentation. So consumer GPUs top out at 16GB (5090 currently) while the best AI GPUs (H200?) is 141GB (I just had to search)? I think the previou sgen was 80GB.

But these GPUs are north of $30k.

Now the Mac Studio tops out currently at 512GB os SHARED memory. That means you can potentially run a much larger model locally without distributing it across machines. Currently that retails at $9500 but that's relatively cheap, in comparison.

But, as it stands now, the best Apple chips have significantly lower memory bandwidth than NVidia GPUs and that really impacts tokens/second.

So I've been waiting to see if Apple will realize this and address it in the next generation of Mac Studios (and, to a lesser extend, Macbook Pros). The H200 seems to be 4.8TB/s. IIRC the 5090 is ~1.8TB/s. The best Apple is (IIRC) 819GB/s on the M3 Ultra.

Apple could really make a dent in NVidia's monopoly here if they address some of these technical limitations.

So I just checked the memory bandwidth of these new chips and it seems like the M5 is 153GB/s, M5 Pro is ~300 and M5 Max is ~600. I was hoping for higher. This isn't a big jump from the M4 generation. I suspect the new Studios will probably barely break 1TB/s. I had been hoping for higher.

▲

fridder 5 hours ago | parent | next [-]

It will be interesting to see the specs on an m5 ultra. Probably have to wait until WWDC at the earliest to see it though

▲

SirMaster 9 hours ago | parent | prev | next [-]

>So consumer GPUs top out at 16GB (5090 currently)

5090 has 32GB, and the 4090 and 3090 both have 24GB.

	▲	6 hours ago \| parent [-]
		[deleted]

▲

ericd 8 hours ago | parent | prev [-]

Hard to get 6000+ bit memory bus HBM bandwidth out of a 512 or 1024 bit memory bus tied to DDR... I think it's also just tough to physically tie in 512 gigs close enough to the GPU to run at those speeds. But yeah, I wish there was a very competitive local option, too, short of spending $50k+.

▲ lakrici88284 9 hours ago | parent | prev | next [-]

[dead]

▲ lynx97 9 hours ago | parent | prev | next [-]

The topic is MacBook, so my criticism is a little off. However, I really dont believe in this "local LLM" promise from Apple. My phone already gets noticeably warm if I answer 5 WhatsApp messages. And looses 5% of battery during the process. I highly doubt Apple will have a useable local LLM that doesn't drain my battery in minutes, before 2030.

▲

cosmic_cheese 9 hours ago | parent [-]

Something is not right if WhatsApp is seriously draining your phone like that. Admittedly I’m not a big WhatsApp user my iPhone hasn’t had any trouble like that with it.

	▲	jakeydus 9 hours ago \| parent [-]
		Yeah is OP using an iPhone X?

▲ meisel 9 hours ago | parent | prev | next [-]

What % of users actually care that much about local LLMs? It appears to still be an inferior (though maybe decent) service compared to ChatGPT etc., and requires very top-end hardware. Is privacy _that_ important to people when their Google search history has been a gateway to the soul for years? I wonder if these machines would cost significantly less (or put the cost to other things, e.g. more CPU cores) without this emphasis on LLMs.

	▲	barrell 9 hours ago \| parent [-]
		Privacy is definitely not a cern for the layman, but it is for lots of people, especially pro users. I also haven’t made a google search in years. I also haven’t seen any improvements in the frontier models in years, and I’m anxiously awaiting local models to catch up.

▲ neya 8 hours ago | parent | prev [-]

> I still think Apple has a huge opportunity in privacy first LLMs

This correlation of Apple and privacy needs to rest. They have consistently proven to be otherwise - despite heavily marketing themselves as "privacy-first"

https://www.theguardian.com/technology/2019/jul/26/apple-con...

▲

4fterd4rk 8 hours ago | parent | next [-]

I think it's a little telling that the best you can do is a seven year old article.

	▲	neya 7 hours ago \| parent \| next [-]
		So, somehow now they are the beacons of privacy and we should just ignore their history of spying on their users?
	▲	lern_too_spel 7 hours ago \| parent \| prev [-]
		No other company makes you tell them every application you install on your device. No other company makes you tell them every location you read from your GPS sensor.

▲

matthewfcarlson 6 hours ago | parent | prev | next [-]

I think it's all about relativity. Are they private compared to an open source privacy focused OS like grapheneOS and the fantastic folks running that project? No. Are they more private than a company like meta or google who has much worse incentives for privacy than Apple? Probably.

Do I wish Apple was way more transparent and gave users more control over gatekeeper and other controversial features that erode privacy? Absolutely.

▲

chaostheory 8 hours ago | parent | prev [-]

Not for everything. Apple has initially focused on edge AI that runs locally per device. It didn’t work out well the first try, but I would still bet on them trying again once compute catches up. Besides, they still have a better track record than the other tech giants.