| ▲ | tsimionescu 20 hours ago |
| > When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this". I'm also now imagining my GPU whirring into life and the accompanying sound of a jetplane getting ready for takeoff, as my battery suddenly starts draining visibly. Local LLMs for are a pipe dream, the technology fundamentally requires far too much computation for any true intelligence to ever make sense with current computing technologies. |
|
| ▲ | AuthAuth 20 hours ago | parent | next [-] |
| Most laptops are now shipping with a NPU for handling these tasks. So it wont be getting computed on your GPU. |
| |
| ▲ | tsimionescu 20 hours ago | parent [-] | | That doesn't mean anything, it's just a name change. They're the same kind of unit. And whatever accelerator you try to put into it, you're not running Gemini3 or GPT-5.1 on your laptop, not in any reasonable time frame. | | |
| ▲ | Intermernet 19 hours ago | parent | next [-] | | Over the last few decades I've seen people make the same comment about spell checking, voice recognition, video encoding, 3D rendering, audio effects and many more. I'm happy to say that LLM usage will only actually become properly integrated into background work flow when we have performant local models. People are trying to madly monetise cloud LLMs before the inevitable rise of local only LLMs severely diminishes the market. | | |
| ▲ | tsimionescu 12 hours ago | parent | next [-] | | Time will tell, but right now we're not solving the problem of running LLMs by increasing efficiency, we're solving it by massive, unprecedented investments in compute power and just power. Companies definitely weren't building nuclear power stations to power their spell checkers or even 3D renderers. LLMs are unprecedented in this way. | | |
| ▲ | Intermernet 5 hours ago | parent [-] | | True, but the usefulness of local models is actually getting better. I hope that the current unprecedented madness is a factor of the potential of cloud models, and not a dismissal of the possibility of local models. It's the biggest swing we've seen (with the possible exception of cloud computing vs local virtualisation) but that may be due to recognition of the previous market behaviour, and a desperate need to not miss out on the current boom. |
| |
| ▲ | 14 hours ago | parent | prev [-] | | [deleted] |
| |
| ▲ | AuthAuth 19 hours ago | parent | prev | next [-] | | Also it does mean something. An NPU is completely different from your 5070. Yes the 5070 has specific AI cores but it also has raster cores and other things not present in an NPU. You dont need to run GPT5.1 to summerize a webpage. Models are small and specialized for different tasks. | | |
| ▲ | tsimionescu 12 hours ago | parent [-] | | And all of that is irrelevant for the AI use case. The NPU is at best slightly more efficient than a GPU for this use case, and mostly its just cheaper by forgoing various parts of a GPU that are not useful for AI (and would not be used during inferencing anyway). And the examples being given of why you'd want AI in your browser are all general text comprehension and conversational discussions about that text, applied to whatever I may be browsing. It doesn't really get less specialized than that. |
| |
| ▲ | heavyset_go 13 hours ago | parent | prev [-] | | No, NPUs are designed to be power efficient in ways GPU compute aren't. You also don't need Gemini3 or GPT anything running locally. | | |
| ▲ | tsimionescu 12 hours ago | parent [-] | | Personally, I don't need AI in my browser at all. But if I did, why would I want to run a crappy model that can't think and hallucinates constantly, instead of using a better model that kinda thinks and doesn't hallucinate quite as often? | | |
| ▲ | heavyset_go 12 hours ago | parent [-] | | I generally agree with you, but you'd be surprised at what lower parameter models can accomplish. I've got Nemo 3 running on an iGPU on a shitty laptop with SO-DIMM memory, and it's good enough for my tasks that I have no use for cloud models. Similarly, Granite 4 based models are even smaller, just a couple of gigabytes and are capable of automation tasks, summarization, translation, research etc someone might want in a browser. Both do chain of reasoning / "thinking", both are fast, and once NPU support lands in runtimes, they can be offloaded on to more efficient hardware. They certainly aren't perfect, but at least in my experience, fuzzy accuracy / stochastic inaccuracy is good enough for some tasks. |
|
|
|
|
|
| ▲ | starik36 20 hours ago | parent | prev [-] |
| That's the point. For things like summarizing a webpage or letting the user ask questions about it, not that much computation is required. An 8B Ollama model installed on a middle of the road MacBook can do this effortlessly today without whirring. In several years, it will probably be all laptops. |
| |
| ▲ | skydhash 19 hours ago | parent | next [-] | | But what you would want to summarize a page. If I'm reading a blog, that means that I want to read it, not just a condensed version that might miss the exact information I need for an insight or create something that was never there. | | |
| ▲ | AlotOfReading 18 hours ago | parent | next [-] | | You can also just skim it. It feels like LLM summarization boils down to an argument to substitute technology for media literacy. Plus, the latency on current APIs is often on the order of seconds, on top of whatever the page load time is. We know from decades [0] of research that users don't wait seconds. [0] https://research.google/blog/speed-matters/ | | |
| ▲ | CamperBob2 18 hours ago | parent [-] | | It makes a big difference when the query runs in a sidebar without closing the tab, opening a new one, or otherwise distracting your attention. | | |
| ▲ | johnnyanmac 17 hours ago | parent [-] | | > without closing the tab, opening a new one, or otherwise distracting your attention. well, 2/3 is admirable in this day and age. |
|
| |
| ▲ | CamperBob2 18 hours ago | parent | prev [-] | | You don't use it to summarize pages (or at least I don't), but to help understand content within a page while minimizing distractions. For example: I was browsing a Reddit thread a few hours ago and came upon a comment to the effect of "Bertrand Russell argued for a preemptive nuclear strike on the Soviets at the end of WWII." That seemed to conflict with my prior understanding of Bertrand Russell, to say the least. I figured the poster had confused Russell with von Neumann or Curtis LeMay or somebody, but I didn't want to blow off the comment entirely in case I'd missed something. So I highlighted the comment, right-clicked, and selected "Explain this." Instead of having to spend several minutes or more going down various Google/Wikipedia rabbit holes in another tab or window, the sidebar immediately popped up with a more nuanced explanation of Russell's actual position (which was very poorly represented by the Reddit comment but not 100% out of line with it), complete with citations, along with further notes on how his views evolved over the next few years. It goes without saying how useful this feature is when looking over a math-heavy paper. I sure wish it worked in Acrobat Reader. And I hope a bunch of ludds don't browbeat Mozilla into removing the feature or making it harder to use. | | |
| ▲ | homebrewer 18 hours ago | parent [-] | | And this explanation is very likely to be entirely hallucinated, or worse, subtly wrong in ways that's not obvious if you're not already well versed in the subject. So if you care about the truth even a little bit, you then have to go and recheck everything it has "said". Why waste time and energy on the lying machine in the first place? Just yesterday I asked "PhD-level intelligence" for a well known quote from a famous person because I wasn't able to find it quickly in wikiquotes. It fabricated three different quotes in a row, none of them right. One of them was supposedly from a book that doesn't really exist. So I resorted to a google search and found what I needed in less time it took to fight that thing. | | |
| ▲ | CamperBob2 18 hours ago | parent [-] | | And this explanation is very likely to be entirely hallucinated, or worse, subtly wrong in ways that's not obvious if you're not already well versed in the subject. So if you care about the truth even a little bit, you then have to go and recheck everything it has "said". It cited its sources, which is certainly more than you've done. Just yesterday I asked "PhD-level intelligence" for a well known quote from a famous person because I wasn't able to find it quickly in wikiquotes. In my experience this means that you typed a poorly-formed question into the free instant version of ChatGPT, got an answer worthy of the effort you put into it, and drew a sweeping conclusion that you will now stand by for the next 2-3 years until cognitive dissonance finally catches up with you. But now I'm the one who's making stuff up, I guess. | | |
| ▲ | homebrewer 18 hours ago | parent [-] | | Unless you've then read through those sources — and not asked the machine to summarize them again — I don't see how that changes anything. Judging by your tone and several assumptions based on nothing I see that you're fully converted. No reason to keep talking past each other. | | |
| ▲ | CamperBob2 17 hours ago | parent [-] | | No, I'm not "fully converted." I reject the notion that you have to join one cult or the other when it comes to this stuff. I think we've all seen plenty of hallucinated sources, no argument there. Source hallucination wasn't a problem 2-3 years ago simply because LLMs couldn't cite their sources at all. It was a massive problem 1-2 years ago because it happened all the freaking time. It is a much smaller problem today. It still happens too often, especially with the weaker models. I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models. In my example, no, I didn't bother confirming the Russell sources in detail, other than to check that they (a) existed and (b) weren't completely irrelevant. I had other stuff to do and don't actually care that much. The comment just struck me as weird, and now I'm better informed thanks to Firefox's AI feature. My takeaway wasn't "Russell wanted to nuke the Russians," but rather "Russell's positions on pacifism and aggression were more nuanced than I thought. Remember to look into this further when/if it comes up again." Where's the harm in that? Can you share what you asked, and what model you were using? I like to collect benchmark questions that show where progress is and is not happening. If your question actually elicited such a crappy response from a leading-edge reasoning model, it sounds like a good one. But if you really did just issue a throwaway prompt to a free/instant model, then trust me, you got a very wrong impression of where the state of the art really is. The free ChatGPT is inexcusably bad. It was still miscounting the r's in "Strawberry" as late as 5.1. | | |
| ▲ | tsimionescu 12 hours ago | parent [-] | | > I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models. And here you get back to my original point: to get good (or at least better) AI, you need complex and huge models, that can't realistically run locally. |
|
|
|
|
|
| |
| ▲ | tsimionescu 12 hours ago | parent | prev | next [-] | | You can just look down thread at what people actually expect to do - certainly not (just) text summarization. And even for summarization, if you want it to work for any web page (history blog, cooking description, github project, math paper, quantum computing breakthrough), and you want it accurate, you will certainly need way more than Ollama 8B. Add local image processing (since huge amounts of content are not understandable or summarizable if you can't understand images used in the content), and you'll see that for a real 99% solution you need models that will not run locally even in very wild dreams. | |
| ▲ | johnnyanmac 17 hours ago | parent | prev [-] | | Sure. Let's solve our memory crisis without triggering WW3 with China over Taiwan first, and maybe then we can talk about adding even more expensive silicon to increasingly expensive laptops. |
|