Remix.run Logo
infecto 5 hours ago

"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games."

I don't know who will be the winner but with some of the recent releases from gemma it seems more probable that you may run some models locally if only from a cost perspective, not even considering business security. Not sure how this type of architecture would make for good gaming though, puts into question the whole statement.

"Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

root-parent 4 hours ago | parent | next [-]

"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games..."

This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"

throw0101a 4 hours ago | parent | next [-]

> This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"

Digging into this:

> In conclusion, there is evidence that Ken Olsen did doubt the need for computers in the home, but the evidence is based primarily on the testimony of David Ahl who was perturbed when the personal computer project he championed at DEC was not supported by Olsen in 1974.

> Olsen’s resistance may have been similar to that expressed by another DEC executive, Gordon Bell. In 1980 Bell thought home terminals would act as gateways to remote computers which would provide appropriate services.

* https://quoteinvestigator.com/2017/09/14/home-computer/

It was supposedly said in 1977: most computers at that time were not small, and so it would not be surprising that people would not expect the general public to desire a large, power-hungry, noise-y apparatus in their house.

wccrawford an hour ago | parent | next [-]

That's exactly the point. Until recently, AI models that could run on home machines were so bad that it was very hard to imagine anyone wanting to.

And, like the overly large machines of 1977, models are getting faster, leaner, and better. It's happening a lot quicker, though.

kristov 2 hours ago | parent | prev | next [-]

We kinda ended up with terminals connected to mainframes anyway. The terminal being the web browser, and the mainframe being SaS. So it wasn't that far off.

supermatt an hour ago | parent [-]

the network is the computer

wslh 2 hours ago | parent | prev | next [-]

The simple explanation is that predicting the future is generally impossible. It doesn't matter if it's Olsen or anybody else.

parineum 3 hours ago | parent | prev [-]

It doesn't really need this much explanation.

People take these quotes out of context all the time. Said in a business context, there was no need, at that time, for someone to have a personal computer.

There's no business justification in 1977 for a personal computer department at a business. It's similar to the gates quote about RAM (I think it was 64KB?).

These statements aren't meant to be forever quotes. Their business plan quotes.

michaelcampbell 2 hours ago | parent | next [-]

> It's similar to the gates quote about RAM (I think it was 64KB?)

640, and Bill Gates said he either never said that, or at least never remembered having said it. I think there is no evidence anywhere that he did.

https://www.computerworld.com/article/1563853/the-640k-quote...

Shorel 9 minutes ago | parent [-]

That exact quote? No, never. He said something like: current computers at the time had 64kb of RAM, so the OS was designed with a limit of 640kb, and he believed this would give them 10 years of future proofing. As it happened, that limit was reached much faster, in about 6 years.

glimshe 3 hours ago | parent | prev [-]

Or maybe he simply made a mistake. Big deal. This doesn't speak negatively of his other achievements.

shermantanktop 2 hours ago | parent [-]

He had a long career and presumably many successes, and is fallible like the rest of us. But a half-remembered zinger with no context makes for zippier posts I guess.

The early popularity of Minitel, the continued popularity of ssh/tmux, and the web browser itself indicates that bespoke client applications are not the only way. He wasn’t directionally wrong.

joering2 3 hours ago | parent | prev | next [-]

or "640K ought to be enough for anybody."

shermantanktop 2 hours ago | parent | next [-]

https://quoteinvestigator.com/2011/09/08/640k-enough/

Nobody ever said that, at least not as an assertion or prediction. The actual instances of similar language are from multiple people describing their earlier thoughts before they learned it wasn’t true.

throw1234567891 2 hours ago | parent | prev | next [-]

There’s no public proof this has ever been said, and if it was, if it was not taken out of context.

DonHopkins an hour ago | parent | prev [-]

I have that many browser tabs.

AaronAPU 4 hours ago | parent | prev | next [-]

That’s too strong of an assertion.

Local models aren’t deterministically equivalent in capabilities to foundation models. Home computers are turing complete; just like a mainframe. They are just slower. Often not slower enough to matter.

sandworm101 4 hours ago | parent [-]

Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem. If i want to crop my ex out of family photos, i should not have to first give that photo to Microsoft. If want an LLM to write a book report for me, i dont want it also alerting my school. And if i write a memo for a client, and i want an LLM to check the spelling, i dont want that memo leaked either.

parineum 3 hours ago | parent | next [-]

> Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem.

Maybe if you ask them that question, but if you show them two products, they'll definitely prefer the faster one. 30 seconds is a long time to watch a progress bar.

sandworm101 an hour ago | parent | next [-]

Fast and public, or slow and private. Not everyone wants, or is allowed to, share their data with the AI world. And do not doubt that every bit shared with an AI service will be used for training.

spwa4 2 hours ago | parent | prev [-]

Plus there's the other question. If this thing is slower ... what's the price? The desktop/mini-pc version of this is $3000, after all. At this performance level what is an acceptable price for the laptops?

People definitely aren't going to accept more expensive + slower ...

Pxtl 2 hours ago | parent | prev [-]

I'd like to think so but the existence of Google and Apple and Microsoft's cloud based photo tools with phone integration suggests that's false.

You could run a pretty good home server on $50 of gear and yet we never saw any real adoption of OwnCloud/NextCloud style products as an alternative to Google Drive/Photos or Apple Cloud.

Why should LLM/Transformers be any different? Especially when you need a proper expensive GPU to run them instead of a Raspberry Pi?

thewebguyd 2 hours ago | parent | next [-]

Apple's photo tools run on device, and they'll probably ship more on device foundation models at WWDC too.

On-device AI is going to be important, I think. It doesn't have to take the form of a chatbot UI to be useful.

com2kid an hour ago | parent | prev [-]

After the latest round of cloud storage price increases my non technical wife has been asking if we can do local backups instead...

fg137 2 hours ago | parent | prev [-]

You seriously think running LLM is the same thing as general computing?

ako an hour ago | parent [-]

It’s better, it’s useful even for those who don’t have a deep knowledge of computers. I’d expect more AI users than programmers, than ms-word users, than excel users.

jb1991 2 hours ago | parent | prev | next [-]

He’s just a braggart. When you see something like this in somebody’s personal bio on social media, it’s basically a banner that means “take everything I say in the context of me promoting myself.”

strictnein an hour ago | parent | prev | next [-]

I also don't get why this twitter user is linked here, versus all the news articles about this new hardware that have been everywhere over the past number of days.

smcleod 2 hours ago | parent | prev | next [-]

Qwen 3.6 is far ahead of Gemma for most (but not all) things. I've deployed it out across a number of M5 MacBooks and it's genuinely useful for many tasks. It won't replace an Opus or current gen Sonnet sized model but it's still amazingly good for its size and probably as good as or just a bit before Sonnet 4 era. Far more reliable for tool calling, coding, agentic tasks and faster than the Gemma models especially with MTP.

zozbot234 2 hours ago | parent | next [-]

Qwen 3.6 is a toy compared to DeepSeek V4 Flash or Pro. These models can now run on Apple Silicon hardware with as little as 32GB RAM for the Flash (with 2-bit quant, which is still quite capable) using SSD offloading, with just-about-reasonable performance for interactive use, and far better performance on longer contexts than Qwen (due to the more efficient KV cache/attention mechanisms in DeepSeek).

Very significant improvements may be viable for unattended inference via large-scale batches, which can reuse sparse experts and thereby mask some of the latency involved - this is quite unique to DeepSeek, again due to its efficient KV cache.

greenavocado an hour ago | parent [-]

Qwen 3.6 27B still curb stomps Deepseek V4 in coding

epolanski an hour ago | parent [-]

1. Deepseek V4 is still in preview (training is not finished)

2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them.

3. Qwen doesn't like quantization at all.

kgeist 7 minutes ago | parent | next [-]

I have to disagree with most claims. I run Qwen3.6-27b at 260k context and 40-60 tok/sec. It handles most coding problems as well as Sonnet 4.6 under OpenCode on our production tasks. (As an experiment, I run the same prompts for the same issues in parallel for Qwen 3.6 and Sonnet 4.6 and usually see little difference in performance). I see zero degradation from quantization in practice.

Settings: RTX 5090, 5-bit weights (Unsloth), FP8 KV cache.

Last time I tried running large MoEs on this PC, they had inferior quality at 2-3 bits than much smaller dense models at 5-6 bits, and were way slower anyway.

trollbridge 5 minutes ago | parent | prev [-]

You can run the 35B A3B model which is an MoE. Runs great on a 5090.

epolanski an hour ago | parent | prev | next [-]

Qwen suffers quantization a lot, rendering it borderline unusable.

Pxtl 2 hours ago | parent | prev [-]

I've got a Qwen 3.5 running on a 12GB 3060 and it's dumb as a stump but still smart enough to get some useful work done. Since it's my daily driver desktop I havent jumped to 3.6 since last time I did I quickly ran out of vram and locked the desktop environment.

But yeah, the Qwen line is pretty impressive on commodity hardware.

derefr 2 hours ago | parent [-]

I must be using LLMs very differently than y'all, because I can't think of a single thing I would rely on an LLM that's "dumb as a stump" to do for me.

To me, LLMs are for asking research questions + exploring design spaces + pointing at codebases to investigate bugs. And those all benefit from the model being as "smart" (in terms of both fluid intelligence and burned-in knowledge) as possible.

I'm guessing there exist problems where "intelligence past a certain point" doesn't matter, so these medium-sized models can match the performance of the bigger models. But what problems might those be?

flatline 3 hours ago | parent | prev | next [-]

The HN crowd is, by and large, not the target audience for his self promotion. I guarantee there is one and this is more or less effective.

unmole 5 hours ago | parent | prev | next [-]

> you may run some models locally if only from a cost perspective

I have a hard time believing running a model on a laptop will be cheaper than running it in a datacenter. Why wouldn't economies of scale apply here as with every other computation?

zozbot234 2 hours ago | parent | next [-]

The datacenter setting has huge economies of scale for low-latency, just-in-time inference using extremely large models, but that's not the only viable use of AI. Batched, unattended inference of possibly smaller and weaker models, while theoretically viable in a datacenter setting, is far from the best use of that hardware. This is where local AI is at its best.

dgellow 5 hours ago | parent | prev | next [-]

A laptop is really a pretty bad form factor to run LLMs. Worst cooling, more expensive memory that you cannot replace, resell value depreciating fast. It’s fine for tinkering, small scale research, and demos but it’s definitely niche.

The vision NVIDIA is selling is pure marketing IMHO

wazdra 4 hours ago | parent | prev | next [-]

This is assuming that you'll be priced the fraction of computing that you consumed. But you are actually paying for their infrastructure, for the R&D (and also the computation that went into training the model) etc. It is not clear that, for your own small computations, this kind of costs are needed, but you will still pay your share in the investment the provider made so that they could serve everyone's computation needs.

hungryhobbit 3 hours ago | parent | next [-]

But, currently ... you're not. AI companies are operating at a loss, and are being subsidized by their investors.

Local may or may not be cheaper than remote now, depending on the details, but the factors you describe won't affect the math nearly as much as they will once that subsidization ends.

wjnc 4 hours ago | parent | prev [-]

In that analogy bigtech AI is currently investing in cleaner air for all of us? We _could_ breath it through their hose, but might as well breath it outside.

itishappy 2 hours ago | parent | prev | next [-]

It's cheaper for the AI provider to use your laptop instead of their datacenter.

jerf 3 hours ago | parent | prev | next [-]

What "every other computation"? I seem to have a lot processing power at my disposal here, between my cell phones, laptops, gaming PCs, various other hardware devices.

You're going to need to analyze the problem much more deeply because it sound like the standards you are implicitly applying would result in "economically, everything should be centrally hosted" but that is clearly not the result that obtains. Even a modern mid-grade cell phone is no slouch; you may not be running a current-gen frontier AI on it but you certainly can do a lot of other rather intense things locally that would have been laughable 10 years ago, like suprisingly high powered games.

TylerE 2 hours ago | parent | prev [-]

Because economy of scale isn't really the right metric here. A machine you were you were going to buy anyway essentially has a TCO of $0.

dofm 29 minutes ago | parent [-]

AI models will pretty undeniably affect your electricity bill; yes you already own the computer, but it will cost more to run it if it's doing inference!

falsemyrmidon 2 hours ago | parent | prev | next [-]

> this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

Do you think he's in mensa too?

bespokedevelopr 3 hours ago | parent | prev | next [-]

The security aspect is the main driver why I’m seeing so many businesses investing in local hardware. They know the models aren’t as good (caveat that they also can’t run Chinese models) and that’s ok. Places that really care about security and data governance already aren’t on the bleeding edge. They wait for the nice stable lts version, they lock down dev machines in frustrating ways and have lots of IT admin layers.

But they also want to taste the sweet fruit of AI so the only way to do this that a CISO will approve is on local air gapped hardware. It’s a niche but still a billion dollar niche.

thewebguyd 2 hours ago | parent [-]

Microsoft is working on this with their new execution containers (https://github.com/microsoft/mxc)

GeekyBear 2 hours ago | parent | prev | next [-]

> However, it will make decent machines to play video games."

Where you will need games to be rewritten for ARM to get full performance, just like on Apple's M series chips.

jayd16 2 hours ago | parent | prev | next [-]

Maybe they just mean from a "it can run a lot of DLSS" perspective.

epolanski an hour ago | parent | prev | next [-]

DeepSeek Flash v4 is the leading local AI on 128GB machines, and DS4 is still in preview (training not finished), no?

Especially on Dwarfstar.

cyanydeez 3 hours ago | parent | prev | next [-]

128GB seems the sweet spot for local models. I can program and install most GitHub projects with opencode and QWEN 32b with mtp.

anyone whose addicted to token theoughput is losing the operational knowledge and offline capabilities.

if you arent moving to the AMD 395 or MACs then youre hitching aride on the expensive calory ride

throw1234567891 2 hours ago | parent [-]

If you could buy a 256GB you’d be claiming that 256GB is a sweet spot. But I agree with you. Crack-tokens are not the future.

cyanydeez 2 hours ago | parent [-]

no, the fact that MACs and x86 and soon ARM are all going to have 128GB models in every sector, yeah, sure.

But watching everyone flounder because claude goes down or forcing you on API costs.

I'm programming things that'd take me days with a PC that, without OpenAI's VRAM shenagans, would cost you $2k.

It's more than just 'this is what I could do' it's definitely about 'this is what anyone could do with a new PC purchase'.

throw1234567891 2 hours ago | parent [-]

You must be unaware that System76 was already selling 192GB machines, mac studios used to be 512GB max. The only reason why we don’t have them anymore is that we are in RAM shortage.

cyanydeez an hour ago | parent [-]

I'm aware you can have more. the term "SWEET SPOT" references a area that anyone/everyone can get to and isn't some magical expensive unicorn.

You're doing what the IT industry has been addicted to for decades: number goes up.

throw1234567891 5 minutes ago | parent [-]

> You're doing what the IT industry has been addicted to for decades: number goes up.

No, I have a hands on experience with bigger models, and understand the advantages of using them.

voidfunc 3 hours ago | parent | prev | next [-]

> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers"

This made me laugh. I can only image how insufferable this person is to deal with.

unstatusthequo 2 hours ago | parent | prev | next [-]

I hope a family-level AI appliance is a thing later. Local non-cloud assistant that lives in the house, families interact via voice or phones or whatever. Knows the contextual family stuff you need, etc.

Pxtl 2 hours ago | parent [-]

We didn't get people buying family-level file servers for the family photo gallery and documents at any real scale, so i doubt we'll see similar for AI especially when the cost is that much higher for GPUs vs an SBC machine.

sandworm101 5 hours ago | parent | prev | next [-]

Lots of people are already running AI locally. They are the people buying up all the consumer-grade nvidea gpus. What are they doing with them? Well, the same things people with home media or email servers are doing: stuff they dont want to share with the general public.

Zetaphor 4 hours ago | parent [-]

I want to reduce my dependency on companies like Google, OpenAI, and Anthropic. Aside from the concerns of data sharing I'm also not a fan of how they run their operations, for example Anthropic now using xAI's Colossus data center which is poisoning a marginalized community, or OpenAI getting in bed with the military.

Not everything I want to use an LLM for requires "PhD level intelligence", and increasingly I'm finding more uses that involve sharing my personal data.

Yesterday my local model helped me when looking for a doctor who is in-network for my insurance. I threw it a screenshot from the providers search results and it looked up reviews for all of them.

pratnala 3 hours ago | parent | next [-]

Which model are you running?

Zetaphor 3 hours ago | parent [-]

Qwen 3.6 35B-A3B and 27B both at Q8 on a Strix Halo machine

sandworm101 4 hours ago | parent | prev [-]

My local AI is currently upscaling an old british comedy from sub-dvd quality to 1k. (It is not availible other than on DVD.) It looks like it will take about a week for my pair of 5060s to chew through the task.

eszed 3 hours ago | parent [-]

Which show?

sandworm101 2 hours ago | parent [-]

Chelmsford 123

I own the DVDs so I'm OK upscaling/editing my own copies for my own use. But if I ran the task on an ai service I would no doubt trigger copyright issues.

SwtCyber 5 hours ago | parent | prev | next [-]

I think the local-model use case is going to become less niche pretty quickly if the models keep getting smaller and more capable. Even if most people do not care about privacy or offline use, the cost argument is pretty strong

iLoveOncall 5 hours ago | parent | prev [-]

> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".

Absolute loser.

nkurz 4 hours ago | parent | next [-]

I agree that it sends the wrong symbol, but actually Daniel is great. He cares tremendously about doing work that is actually real-world useful. I've co-written a few papers with him, and he's really hard working and open to outside suggestions. The danger is that if you send him comments, he'll eventually manage to rope you into writing a new and improved version. Seriously, if you are a non-academic computer scientist with a good idea that you want to publish, he'd be incredibly open to working with you.

As to why he now has this on his blog? I also cringe when I read it. I presume someone told him he should self-promote more, and this is his lame attempt to do so. He's almost certainly the most cited person in his department, but it's entirely possible that none of his colleagues actually know this. Cut him some slack. Self-promotion is not his strength. He's a nerd's nerd, and not a marketer. I'll mention to him that his attempt here might be backfiring when I'm next in contact with him.

hgoel 2 hours ago | parent | next [-]

I kind of get it in the sense that every academic has to make themselves somewhat comfortable with self-promotion even if they don't like it. It's an important part of getting funding, but putting a blurb like that everywhere just hurts his credibility I think.

infecto 3 hours ago | parent | prev | next [-]

I cringe calling it out but it just stood out as it was plastered everywhere and I actually have never seen his links before.

iLoveOncall 4 hours ago | parent | prev [-]

> As to why he now has this on his blog?

He doesn't just have it on his blog, he has it EVERYWHERE. Sometimes 2 or 3 times on the same page.

dgacmu 2 hours ago | parent | prev | next [-]

He's not a loser; he's done some really fun work that many people use daily. I've used his range mapping trick in multiple projects/papers. It's elegant.

It sounds like he's gotten bad advise about how to market himself /or/ this is being marketed to people who have bigger checks to write and whom he believes will be responsive to this kind of marketing. As an academic, it rubs me very wrong - I think it's detrimental to the field when we get into h-index stacking contests or citation count comparisons. But I don't know what incentives he's responding to, which seems important for putting this stuff in context.

(as an aside, it turns out that polars + fastexcel is about 10x faster than pandas + openpyxl for searching that dataset, if anyone else is curious what he was actually talking about. :)

netsharc 5 hours ago | parent | prev | next [-]

I found his website, https://www.lemire.me/en/ , and the "2%" brag is the very first sentence, geez.

Being the top x% is what OnlyFans girls brag about, professor...

And it's not exactly brain surgery, is it? https://www.youtube.com/watch?v=THNPmhBl-8I

Zetaphor 5 hours ago | parent [-]

> Daniel Lemire’s blog is one of the top 50 most popular blogs on Hacker News, the standard tech news aggregation site.

Citation needed

nkurz 4 hours ago | parent [-]

https://refactoringenglish.com/tools/hn-popularity/

thg 3 hours ago | parent [-]

For posterity: It's rank 34 at the time of this comment

SkiFire13 3 hours ago | parent | prev | next [-]

That lines looks very cringe indeed, but the guy has some crazy good blogposts on SIMD stuff.

5 hours ago | parent | prev [-]
[deleted]