Remix.run Logo
mackopes 3 days ago

For some time I have a feeling that Apple actually IS playing the hardware game in the age of AI. Even though they are not actively innovating on the AI software or shipping products with AI, their hardware (especially the unified memory) is great for running large models locally.

You can't get a consumer-grade GPU with enough VRAM to run a large model, but you can do so with macbooks.

I wonder if doubling down on that and shipping devices that let you run third party AI models locally and privately will be their path.

If only they made their unified memory faster as that seems to be the biggest bottleneck regarding LLMs and their tk/s performance.

ChocolateGod 3 days ago | parent | next [-]

> You can't get a consumer-grade GPU with enough VRAM to run a large model, but you can do so with macbooks.

You can if you're willing to trust a modded GPU with leaked firmware from a Chinese backshop

Firerouge 3 days ago | parent [-]

Short of flying to China and buying in person, how can an American find/get one of these?

gmays 3 days ago | parent | prev | next [-]

True, but Apple is a consumer hardware company, which requires billions of users at their scale.

We may care about running LLMs locally, but 99% of consumers don't. They want the easiest/cheapest path, which will always be the cloud models. Spending ~$6k (what my M4 Max cost) every N years since models/HW keep improving to be able to run a somewhat decent model locally just isn't a consumer thing. Nonviable for a consumer hardware business at Apple's scale.

3 days ago | parent [-]
[deleted]
karmakaze 3 days ago | parent | prev | next [-]

On a hypothetical 70b q4 model, the Ryzen AI Max+ 395 (128GB memory with 96GB allocated to iGPU) delivers ~2–5 tokens/sec, slightly trailing the M4 Max’s ~3–7 tokens/sec. The next generation for AMD I expect can easily catch up to or surpass the M4 Max.

A pair of MaxSun/Intel Arc B60 48GB GPUs (dual 24GB B580's on one card) for $1200 each also outperforms the M4 Max.

tyleo 3 days ago | parent | next [-]

This isn’t a great point. “A hypothetical model with hypothetical hardware will beat Apple on a hypothetical timeline.”

The tangible hardware you point out is $2,400 for two niche-specific components vs the Apple hardware which benefits more general use cases.

insane_dreamer 3 days ago | parent | prev [-]

> A pair of MaxSun/Intel Arc B60 48GB GPUs (dual 24GB B580's on one card)

please point me to the laptop with these

csomar 3 days ago | parent | prev | next [-]

This. If we plateau around current SOTA LLM performance and 192/386Gb of memory can run a competitive model, Apple computers could become the new iPhone. They have a unique and unmatched product because of their hardware investment.

Of course nobody knows how this will eventually play out. But people without inside information on what these big organizations have/possess, cannot make such predictions.

orbifold 3 days ago | parent | prev | next [-]

I think it is a given that they are aiming for a fully custom training cluster with custom training chips and inference hardware. That would align well with their abilities and actually isn't too hard to pull off for them given that they have very decent processors, GPUs and NPUs already.

vonneumannstan 3 days ago | parent | next [-]

>I think it is a given that they are aiming for a fully custom training cluster with custom training chips and inference hardware.

It is? I haven't seen anything about this.

billbrown 3 days ago | parent | prev [-]

They're working—almost done—on a CUDA backend for their Apple Silicon framework:

https://github.com/ml-explore/mlx/pull/1983

stefan_ 3 days ago | parent | prev [-]

Memory is not in any way or shape some sort of crucial advantage, you were just tricked into thinking that because it's used for market segmentation and nobody would slaughter their datacenter profits cash cow. The inference and god forbid training on consumer Apple hardware is terrible and behind.

mackopes 3 days ago | parent [-]

Show me another consumer hardware that handles inference and/or training better. How many RTX5090s would you need?

spogbiper 3 days ago | parent | next [-]

https://liliputing.com/nvidia-dgx-spark-is-3000-ai-supercomp...

looks like there will be several good options "soon"?

owebmaster 3 days ago | parent [-]

this is cool! Nvidia should sell notebooks, too.

spogbiper 3 days ago | parent [-]

i think Nvidia is trying to create a sort of reference platform and have other OEMs produce mass market products, so a laptop might happen even if nvidia doesn't make one themselves

NitpickLawyer 3 days ago | parent | prev [-]

For local inference macs have indeed shined through this whole LLM thing, and came out as the preferred device. They are great, the dev experience is good, speeds are ok-ish (a bit slower w/ the new "thinking" models / agentic use with lots of context, but still manageable).

But nvda isn't that far behind, and has already moved to regain some space with their PRO6000 "workstation" GPUs. You get 96GB of VRAM for ~7.5k$, which is more than a comparable RAM mac, but not 30k you previously had to shell for top of the line GPUs. So you get a "prosumer" 5090 with a bit more compute and 3x VRAM, in a computer that can sell for <10k$ and beat any mac at both inference and training, for things that "fit" in that VRAM.

Macs still have the advantage for larger models, tho. The new DGX spark should join that market soon(tm). But they allegedly ran into problems on several fronts. We'll have to wait and see.