Remix.run Logo
JumpCrisscross 4 days ago

I don't think we're anywhere close to running cutting-edge LLMs on our phones or laptops.

What may be around the corner is running great models on a box at home. The AI lives at home. Your thin client talks to it, maybe runs a smaller AI on device to balance latency and quality. (This would be a natural extension for Apple to go into with its Mac Pro line. $10 to 20k for a home LLM device isn't ridiculous.)

simonw 4 days ago | parent | next [-]

Right now you can run some of the best available open weight models on a 512GB Mac Studio, which retails for around $10,000. Here's Qwen3-Coder-480B-A35B-Instruct running at 24 tokens/second at 4bit: https://twitter.com/awnihannun/status/1947771502058672219 and Deep Seek V3 0324 in 4-bit at 20 toks/sec https://twitter.com/awnihannun/status/1904177084609827054

You can also string two 512GB Mac Studios together using MLX to load even larger models - here's 671B 8-bit DeepSeek R1 doing that: https://twitter.com/alexocheema/status/1899735281781411907

zargon 4 days ago | parent [-]

What these tweets about Apple silicon never show you: waiting 20+ minutes for it to ingest 32k context tokens. (Probably a lot longer for these big models.)

logicprog 4 days ago | parent [-]

Yeah, I bought a used Mac Studio (an M1, to be fair, but still a Max and things haven't changed since) hoping to be able to run a decent LLM on it, and was sorely disappointed thanks to the prompt processing speed especially.

alt227 3 days ago | parent [-]

No offense to you personally, but I find it very funny when people hear marketing copy for a product and think it can do anything they said it can.

Apple silicon is still just a single consumer grade chip. It might be able to run certain end user software well, but it cannot replace a server rack of GPUs.

zargon 3 days ago | parent [-]

I don’t think this is a fair take in this particular situation. My comment is in response to Simon Willison, who has a very popular blog in the LLM space. This isn’t company marketing copy; it’s trusted third parties spreading this misleading information.

brokencode 4 days ago | parent | prev | next [-]

Not sure about the Mac Pro, since you pay a lot for the big fancy case. The Studio seems more sensible.

And of course Nvidia and AMD are coming out with options for massive amounts of high bandwidth GPU memory in desktop form factors.

I like the idea of having basically a local LLM server that your laptop or other devices can connect to. Then your laptop doesn’t have to burn its battery on LLM work and it’s still local.

JumpCrisscross 4 days ago | parent | next [-]

> Not sure about the Mac Pro, since you pay a lot for the big fancy case. The Studio seems more sensible

Oh wow, a maxed out Studio could run a 600B parameter model entirely in memory. Not bad for $12k.

There may be a business in creating the software that links that box to an app on your phone.

simonw 4 days ago | parent | next [-]

I have been using a Tailscale VPN to make LM Studio and Ollama running on my Mac available to my iPhone when I leave the house.

brokencode 4 days ago | parent | prev | next [-]

Perhaps said software could even form an end to end encrypted tunnel from your phone to your local LLM server anywhere over the internet via a simple server intermediary.

The amount of data transferred is tiny and the latency costs are typically going to be dominated by the LLM inference anyway. Not much advantage to doing LAN only except that you don’t need a server.

Though the amount of people who care enough to buy a $3k - $10k server and set this up compared to just using ChatGPT is probably very small.

JumpCrisscross 4 days ago | parent [-]

> amount of people who care enough to buy a $3k - $10k server and set this up compared to just using ChatGPT is probably very small

So I maxed that out, and it’s with Apple’s margins. I suspect you could do it for $5k.

I’d also note that for heavy users of ChatGPT, the difference in energy costs for a home setup and the price for ChatGPT tokens may make this financially compelling for heavy users.

brokencode 4 days ago | parent [-]

True, it may be profitable for pro users. At $200 a month for ChatGPT Pro, it may only take a few years to recoup the initial costs. Not sure about energy costs though.

And of course you’d be getting a worse model, since no open source model currently is as good as the best proprietary ones.

Though that gap should narrow as the open models improve and the proprietary ones seemingly plateau.

dghlsakjg 4 days ago | parent | prev [-]

That software is an HTTP request, no?

Any number of AI apps allow you to specify a custom endpoint. As long as your AI server accepts connections to the internet, you're gravy.

JumpCrisscross 4 days ago | parent [-]

> That software is an HTTP request, no?

You and I could write it. Most folks couldn’t. If AI plateaus, this would be a good hill to have occupied.

dghlsakjg 4 days ago | parent [-]

My point is, what is there to build?

The person that is willing to buy that appliance is likely heavily overlapped with the person that is more than capable of pointing one of the dozens of existing apps at a custom domain.

Everyone else will continue to just use app based subscriptions.

Streaming platforms have plateaued (at best), but self hosted media appliances are still vanishingly rare.

Why would AI buck the trend that every other computing service has followed?

itsn0tm3 4 days ago | parent | next [-]

You don’t tell your media player company secrets ;)

I think there is a market here, solely based on actual data privacy. Not sure how big it is but I can see quite some companies have use for it.

dghlsakjg 4 days ago | parent [-]

> You don’t tell your media player company secrets ;)

No, but my email provider has a de-facto repository of incredibly sensitive documents. When you put convenience and cost up against privacy, the market has proven over and over that no one gives a shit.

JumpCrisscross 4 days ago | parent | prev [-]

> what is there to build?

Integrated solution. You buy the box. You download the app. It works like the ChatGPT app, except it's tunneling to the box you have at home which has been preconfigured to work with the app. Maybe you have a subscription to keep everything up to date. Maybe you have an open-source model 'store'.

theshrike79 3 days ago | parent | prev [-]

It's really easy to whip up a simple box that runs local LLM for a whole home.

Marketing it though? Not doable.

Apple is pretty much the only company I see attempting this with some kind of AppleTV Pro.

data-ottawa 4 days ago | parent | prev | next [-]

This is what I’m doing with my amd 395+.

I’m running docker containers with different apps and it works well enough for a lot of my use cases.

I mostly use Qwen Code and GPT OSS 120b right now.

When the next generation of this tech comes through I will probably upgrade despite the price, the value is worth it to me.

milgrum 4 days ago | parent [-]

How many TPS do you get running GPT OSS 120b on the 395+? Considering a Framework desktop for a similar use case, but I’ve been reading mixed things about performance (specifically with regards to memory bandwidth, but I’m not sure if that’s really the underlying issue)

data-ottawa 3 days ago | parent [-]

30-40 at 64k context, but it's a mixture of experts model.

A 70b dense model is slower

Qwen coder 30b Q4 runs 40+.

bigyabai 4 days ago | parent | prev | next [-]

> $10 to 20k for a home LLM device isn't ridiculous.

At that point you are almost paying more than the datacenter does for inference hardware.

JumpCrisscross 4 days ago | parent | next [-]

> At that point you are almost paying more than the datacenter does for inference hardware

Of course. You and I don't have their economies of scale.

bigyabai 4 days ago | parent [-]

Then please excuse me for calling your one-man $10,000 inference device ridiculous.

JumpCrisscross 4 days ago | parent | next [-]

> please excuse me for calling your one-man $10,000 inference device ridiculous

It’s about the real price of early microcomputers.

Until the frontier stabilizes, this will be the cost of competitive local inference. Not pretending what we can run on a laptop will compete with a data centre.

simonw 4 days ago | parent | prev | next [-]

Plenty of hobbies are significantly more expensive than that.

bigyabai 4 days ago | parent [-]

The rallying cry of money-wasters the world over. "At least it's not avgas!"

seanmcdirmid 4 days ago | parent [-]

Some people lose lots of money on boats, some people buy a fancy computer instead and lose less, although still a lot of, money.

brookst 3 days ago | parent | prev | next [-]

How is it not impressive to be able to do something at quantity 1 for roughly the same price megacorps get at quantity 100,000?

Try building a F1 car at home. I guarantee your unit cost will be several orders of magnitude higher than the companies who make several a year.

rpdillon 4 days ago | parent | prev [-]

I mean, not really? Yeah, I pay to go to the movies and sit in a theater that they let me buy a ticket for, but that doesn't mean people that want to set up a nice home theater are ridiculous, they just care more about controlling and customizing their experience.

grim_io 3 days ago | parent [-]

Some would argue that the home theater is a superior experience to a crowded, far away movie theater where the person's head in front of you takes up a quarter of the screen.

The same can't be said for local inference. It is always interior in experience and quality.

A reasonable home theater pays for itself over time if you watch a lot of movies. Plus you get to watch shows as well, which the limited theater program doesn't allow.

I can buy over 8 years of the Claude max $100 plan for the price of the 512GB M3 Ultra. And I can't imagine the M3 being great at this after 5 years of hardware advancement.

vonneumannstan 4 days ago | parent | prev [-]

Almost? Isn't a single h100 like 30k which is the bare minimum to run a big model?

ben_w 4 days ago | parent | prev | next [-]

> $10 to 20k for a home LLM device isn't ridiculous.

That price is ridiculous for most people. Silicon Valley payscales can afford that much, but see how few Apple Vision Pros got sold for far less.

vonneumannstan 4 days ago | parent | prev [-]

Doesnt gpt-oss-120b perform better across the board at a fraction of the memory? Just specced a $4k mac studio that can easily run that at 128 gb memory.