Remix.run Logo
tsimionescu 20 hours ago

That doesn't mean anything, it's just a name change. They're the same kind of unit.

And whatever accelerator you try to put into it, you're not running Gemini3 or GPT-5.1 on your laptop, not in any reasonable time frame.

Intermernet 19 hours ago | parent | next [-]

Over the last few decades I've seen people make the same comment about spell checking, voice recognition, video encoding, 3D rendering, audio effects and many more.

I'm happy to say that LLM usage will only actually become properly integrated into background work flow when we have performant local models.

People are trying to madly monetise cloud LLMs before the inevitable rise of local only LLMs severely diminishes the market.

tsimionescu 12 hours ago | parent | next [-]

Time will tell, but right now we're not solving the problem of running LLMs by increasing efficiency, we're solving it by massive, unprecedented investments in compute power and just power. Companies definitely weren't building nuclear power stations to power their spell checkers or even 3D renderers. LLMs are unprecedented in this way.

Intermernet 5 hours ago | parent [-]

True, but the usefulness of local models is actually getting better. I hope that the current unprecedented madness is a factor of the potential of cloud models, and not a dismissal of the possibility of local models. It's the biggest swing we've seen (with the possible exception of cloud computing vs local virtualisation) but that may be due to recognition of the previous market behaviour, and a desperate need to not miss out on the current boom.

14 hours ago | parent | prev [-]
[deleted]
AuthAuth 19 hours ago | parent | prev | next [-]

Also it does mean something. An NPU is completely different from your 5070. Yes the 5070 has specific AI cores but it also has raster cores and other things not present in an NPU.

You dont need to run GPT5.1 to summerize a webpage. Models are small and specialized for different tasks.

tsimionescu 12 hours ago | parent [-]

And all of that is irrelevant for the AI use case. The NPU is at best slightly more efficient than a GPU for this use case, and mostly its just cheaper by forgoing various parts of a GPU that are not useful for AI (and would not be used during inferencing anyway).

And the examples being given of why you'd want AI in your browser are all general text comprehension and conversational discussions about that text, applied to whatever I may be browsing. It doesn't really get less specialized than that.

heavyset_go 13 hours ago | parent | prev [-]

No, NPUs are designed to be power efficient in ways GPU compute aren't.

You also don't need Gemini3 or GPT anything running locally.

tsimionescu 12 hours ago | parent [-]

Personally, I don't need AI in my browser at all. But if I did, why would I want to run a crappy model that can't think and hallucinates constantly, instead of using a better model that kinda thinks and doesn't hallucinate quite as often?

heavyset_go 12 hours ago | parent [-]

I generally agree with you, but you'd be surprised at what lower parameter models can accomplish.

I've got Nemo 3 running on an iGPU on a shitty laptop with SO-DIMM memory, and it's good enough for my tasks that I have no use for cloud models.

Similarly, Granite 4 based models are even smaller, just a couple of gigabytes and are capable of automation tasks, summarization, translation, research etc someone might want in a browser.

Both do chain of reasoning / "thinking", both are fast, and once NPU support lands in runtimes, they can be offloaded on to more efficient hardware.

They certainly aren't perfect, but at least in my experience, fuzzy accuracy / stochastic inaccuracy is good enough for some tasks.