Why?

Imagine you have an AI button. When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".

Imagine the browser asks you at some point, whether you want to hear about new features. The buttons offered to you are "FUCK OFF AND NEVER, EVER BOTHER ME AGAIN", "Please show me a summary once a month", "Show timely, non-modal notifications at appropriate times".

Imagine you choose the second option, and at some point, it offers you a feature described as follows: "On search engine result pages and social media sites, use a local LLM to identify headlines, classify them as clickbait-or-not, and for clickbait headlines, automatically fetch the article in an incognito session, and add a small overlay with a non-clickbait version of the title". Would you enable it?

▲

johnnyanmac 17 hours ago | parent | next [-]

>Why?

Do we have to re-tread 3 years of big tech overreach, scams, user hostility in nearly every common program , questionable utility that is backed by hype more than results, and way its hoisting up the US economy's otherwise stagnant/weakening GDP?

I don't really have much new to add here. I've hated this "launch in alpha" mentality for nearly a decade. Calling 2022 "alpha" is already a huge stretch.

>When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".

Why is this valuable? I spent my entire childhood reading, and my college years being able to research and navigate technical documents. I don't value auto-summarizations. Proper writing should be able to do this in its opening paragraphs.

>Imagine the browser asks you at some point, whether you want to hear about new features. The buttons offered to you are "FUCK OFF AND NEVER, EVER BOTHER ME AGAIN", "Please show me a summary once a month", "Show timely, non-modal notifications at appropriate times"

Yes, this is my "good enough" compromise that most applications are failing to perform. Let's hope for the best.

>Imagine you choose the second option, and at some point, it offers you a feature described as follows: "On search engine result pages and social media sites, use a local LLM to identify headlines, classify them as clickbait-or-not, and for clickbait headlines, automatically fetch the article in an incognito session, and add a small overlay with a non-clickbait version of the title". Would you enable it?

No, probably not. I don't trust the powers behind such tools to be able to identify what is "clickbait" for me. Grok shows that these are not impartial tools, and news is the last thing I want to outsource sentiment too without a lot of built trust.

meanwhile, trust has only corroded this decade.

▲

evil-olive 15 hours ago | parent | prev | next [-]

> Imagine you have an AI button. When you click it, the locally running LLM

sure, you can imagine Firefox integrating a locally-running LLM if you want.

but meanwhile, in the real world [0]:

> In the next three years, that means investing in AI that reflects the Mozilla Manifesto. It means diversifying revenue beyond search.

if they were going to implement your imagination of a local LLM, there's no reason they'd be talking about "revenue" from LLMs.

but with ChatGPT integrating ads, they absolutely can get revenue by directing users there, in the same way they get money for Google for putting Google's ads into Firefox users' eyeballs.

that's ultimately all this is. they're adding more ads to Firefox.

0: https://blog.mozilla.org/en/mozilla/leadership/mozillas-next...

▲

gooob 12 hours ago | parent [-]

not to mention the high resource-usage of a local LLM that most PCs wouldn't be able to handle, or would just drain a laptop's battery.

	▲	cons0le 4 hours ago \| parent [-]
		All for searching something trivial, where for 99% of cases the already indexed wikipedia summary is good enough and way faster

▲

M2Ys4U 13 hours ago | parent | prev | next [-]

>Imagine you have an AI button. When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".

but.. why? I can read the website myself. That's why I'm on the website.

	▲	charcircuit 10 hours ago \| parent [-]
		People have a limited amount of time, so they may prefer spending it on something else than what a computer can do for them.

▲

tsimionescu 20 hours ago | parent | prev | next [-]

> When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".

I'm also now imagining my GPU whirring into life and the accompanying sound of a jetplane getting ready for takeoff, as my battery suddenly starts draining visibly.

Local LLMs for are a pipe dream, the technology fundamentally requires far too much computation for any true intelligence to ever make sense with current computing technologies.

▲

AuthAuth 20 hours ago | parent | next [-]

Most laptops are now shipping with a NPU for handling these tasks. So it wont be getting computed on your GPU.

▲

tsimionescu 20 hours ago | parent [-]

That doesn't mean anything, it's just a name change. They're the same kind of unit.

And whatever accelerator you try to put into it, you're not running Gemini3 or GPT-5.1 on your laptop, not in any reasonable time frame.

▲

Intermernet 19 hours ago | parent | next [-]

Over the last few decades I've seen people make the same comment about spell checking, voice recognition, video encoding, 3D rendering, audio effects and many more.

I'm happy to say that LLM usage will only actually become properly integrated into background work flow when we have performant local models.

People are trying to madly monetise cloud LLMs before the inevitable rise of local only LLMs severely diminishes the market.

▲

tsimionescu 12 hours ago | parent | next [-]

Time will tell, but right now we're not solving the problem of running LLMs by increasing efficiency, we're solving it by massive, unprecedented investments in compute power and just power. Companies definitely weren't building nuclear power stations to power their spell checkers or even 3D renderers. LLMs are unprecedented in this way.

	▲	Intermernet 5 hours ago \| parent [-]
		True, but the usefulness of local models is actually getting better. I hope that the current unprecedented madness is a factor of the potential of cloud models, and not a dismissal of the possibility of local models. It's the biggest swing we've seen (with the possible exception of cloud computing vs local virtualisation) but that may be due to recognition of the previous market behaviour, and a desperate need to not miss out on the current boom.

▲

14 hours ago | parent | prev [-]

[deleted]

▲

AuthAuth 19 hours ago | parent | prev | next [-]

Also it does mean something. An NPU is completely different from your 5070. Yes the 5070 has specific AI cores but it also has raster cores and other things not present in an NPU.

You dont need to run GPT5.1 to summerize a webpage. Models are small and specialized for different tasks.

	▲	tsimionescu 12 hours ago \| parent [-]
		And all of that is irrelevant for the AI use case. The NPU is at best slightly more efficient than a GPU for this use case, and mostly its just cheaper by forgoing various parts of a GPU that are not useful for AI (and would not be used during inferencing anyway). And the examples being given of why you'd want AI in your browser are all general text comprehension and conversational discussions about that text, applied to whatever I may be browsing. It doesn't really get less specialized than that.

▲

heavyset_go 13 hours ago | parent | prev [-]

No, NPUs are designed to be power efficient in ways GPU compute aren't.

You also don't need Gemini3 or GPT anything running locally.

▲

tsimionescu 12 hours ago | parent [-]

Personally, I don't need AI in my browser at all. But if I did, why would I want to run a crappy model that can't think and hallucinates constantly, instead of using a better model that kinda thinks and doesn't hallucinate quite as often?

	▲	heavyset_go 12 hours ago \| parent [-]
		I generally agree with you, but you'd be surprised at what lower parameter models can accomplish. I've got Nemo 3 running on an iGPU on a shitty laptop with SO-DIMM memory, and it's good enough for my tasks that I have no use for cloud models. Similarly, Granite 4 based models are even smaller, just a couple of gigabytes and are capable of automation tasks, summarization, translation, research etc someone might want in a browser. Both do chain of reasoning / "thinking", both are fast, and once NPU support lands in runtimes, they can be offloaded on to more efficient hardware. They certainly aren't perfect, but at least in my experience, fuzzy accuracy / stochastic inaccuracy is good enough for some tasks.

▲

starik36 20 hours ago | parent | prev [-]

That's the point. For things like summarizing a webpage or letting the user ask questions about it, not that much computation is required.

An 8B Ollama model installed on a middle of the road MacBook can do this effortlessly today without whirring. In several years, it will probably be all laptops.

▲

skydhash 19 hours ago | parent | next [-]

But what you would want to summarize a page. If I'm reading a blog, that means that I want to read it, not just a condensed version that might miss the exact information I need for an insight or create something that was never there.

▲

AlotOfReading 18 hours ago | parent | next [-]

You can also just skim it. It feels like LLM summarization boils down to an argument to substitute technology for media literacy.

Plus, the latency on current APIs is often on the order of seconds, on top of whatever the page load time is. We know from decades [0] of research that users don't wait seconds.

[0] https://research.google/blog/speed-matters/

▲

CamperBob2 18 hours ago | parent [-]

It makes a big difference when the query runs in a sidebar without closing the tab, opening a new one, or otherwise distracting your attention.

	▲	johnnyanmac 17 hours ago \| parent [-]
		> without closing the tab, opening a new one, or otherwise distracting your attention. well, 2/3 is admirable in this day and age.

▲

CamperBob2 18 hours ago | parent | prev [-]

You don't use it to summarize pages (or at least I don't), but to help understand content within a page while minimizing distractions.

For example: I was browsing a Reddit thread a few hours ago and came upon a comment to the effect of "Bertrand Russell argued for a preemptive nuclear strike on the Soviets at the end of WWII." That seemed to conflict with my prior understanding of Bertrand Russell, to say the least. I figured the poster had confused Russell with von Neumann or Curtis LeMay or somebody, but I didn't want to blow off the comment entirely in case I'd missed something.

So I highlighted the comment, right-clicked, and selected "Explain this." Instead of having to spend several minutes or more going down various Google/Wikipedia rabbit holes in another tab or window, the sidebar immediately popped up with a more nuanced explanation of Russell's actual position (which was very poorly represented by the Reddit comment but not 100% out of line with it), complete with citations, along with further notes on how his views evolved over the next few years.

It goes without saying how useful this feature is when looking over a math-heavy paper. I sure wish it worked in Acrobat Reader. And I hope a bunch of ludds don't browbeat Mozilla into removing the feature or making it harder to use.

▲

homebrewer 18 hours ago | parent [-]

And this explanation is very likely to be entirely hallucinated, or worse, subtly wrong in ways that's not obvious if you're not already well versed in the subject. So if you care about the truth even a little bit, you then have to go and recheck everything it has "said".

Why waste time and energy on the lying machine in the first place? Just yesterday I asked "PhD-level intelligence" for a well known quote from a famous person because I wasn't able to find it quickly in wikiquotes.

It fabricated three different quotes in a row, none of them right. One of them was supposedly from a book that doesn't really exist.

So I resorted to a google search and found what I needed in less time it took to fight that thing.

▲

CamperBob2 18 hours ago | parent [-]

It cited its sources, which is certainly more than you've done.

Just yesterday I asked "PhD-level intelligence" for a well known quote from a famous person because I wasn't able to find it quickly in wikiquotes.

In my experience this means that you typed a poorly-formed question into the free instant version of ChatGPT, got an answer worthy of the effort you put into it, and drew a sweeping conclusion that you will now stand by for the next 2-3 years until cognitive dissonance finally catches up with you. But now I'm the one who's making stuff up, I guess.

▲

homebrewer 18 hours ago | parent [-]

Unless you've then read through those sources — and not asked the machine to summarize them again — I don't see how that changes anything.

Judging by your tone and several assumptions based on nothing I see that you're fully converted. No reason to keep talking past each other.

▲

CamperBob2 17 hours ago | parent [-]

No, I'm not "fully converted." I reject the notion that you have to join one cult or the other when it comes to this stuff.

I think we've all seen plenty of hallucinated sources, no argument there. Source hallucination wasn't a problem 2-3 years ago simply because LLMs couldn't cite their sources at all. It was a massive problem 1-2 years ago because it happened all the freaking time. It is a much smaller problem today. It still happens too often, especially with the weaker models.

I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models.

In my example, no, I didn't bother confirming the Russell sources in detail, other than to check that they (a) existed and (b) weren't completely irrelevant. I had other stuff to do and don't actually care that much. The comment just struck me as weird, and now I'm better informed thanks to Firefox's AI feature. My takeaway wasn't "Russell wanted to nuke the Russians," but rather "Russell's positions on pacifism and aggression were more nuanced than I thought. Remember to look into this further when/if it comes up again." Where's the harm in that?

Can you share what you asked, and what model you were using? I like to collect benchmark questions that show where progress is and is not happening. If your question actually elicited such a crappy response from a leading-edge reasoning model, it sounds like a good one. But if you really did just issue a throwaway prompt to a free/instant model, then trust me, you got a very wrong impression of where the state of the art really is. The free ChatGPT is inexcusably bad. It was still miscounting the r's in "Strawberry" as late as 5.1.

	▲	tsimionescu 12 hours ago \| parent [-]
		> I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models. And here you get back to my original point: to get good (or at least better) AI, you need complex and huge models, that can't realistically run locally.

▲

tsimionescu 12 hours ago | parent | prev | next [-]

You can just look down thread at what people actually expect to do - certainly not (just) text summarization. And even for summarization, if you want it to work for any web page (history blog, cooking description, github project, math paper, quantum computing breakthrough), and you want it accurate, you will certainly need way more than Ollama 8B. Add local image processing (since huge amounts of content are not understandable or summarizable if you can't understand images used in the content), and you'll see that for a real 99% solution you need models that will not run locally even in very wild dreams.

▲

johnnyanmac 17 hours ago | parent | prev [-]

Sure. Let's solve our memory crisis without triggering WW3 with China over Taiwan first, and maybe then we can talk about adding even more expensive silicon to increasingly expensive laptops.

▲

pjc50 7 hours ago | parent | prev | next [-]

> The buttons offered to you are "FUCK OFF AND NEVER, EVER BOTHER ME AGAIN"

I've already hit that option before reading the other ones.

> "On search engine result pages and social media sites, use a local LLM to identify headlines, classify them as clickbait-or-not, and for clickbait headlines, automatically fetch the article in an incognito session, and add a small overlay with a non-clickbait version of the title"

Why would you bother fetching the clickbait at all? It's spam.

The main transformation I want out of a browser, the absolutely critical one, is the removal of advertising. I concede that AI might be decent at removing ads and all the overlay clutter that makes news sites unreadable; does anyone have the demo of "AI readability mode"? Crucially I do not want it changing any non-ad text found on the page.

▲

nemomarx 21 hours ago | parent | prev | next [-]

That last one sounds like a lot of churn and resources for little results? You're not really making them sound compelling compared to just blocking click bait sites with a normal extension somehow. And it could also be an extension users install and configure - why a pop up offering it to me, and why built into the browser that directly?

▲

MiddleEndian 5 hours ago | parent | prev | next [-]

I like Firefox and don't think it's about to collapse like many users here, but I have already unchecked "Recommend features as you browse" and "Recommend extensions as you browse" along with setting the welcome page for updates to about:blank.

Ideally the user interface for any tool I use should never change unless I actively prompt it to change, and the only notifications I should get would be from my friends and family contacting me or calendars/alarms that I set myself.

▲

mcjiggerlog 19 hours ago | parent | prev | next [-]

> Imagine you have an AI button. When you click it, the locally running LLM gets a copy of the web site in the context window, and you get to ask it a prompt, e.g. "summarize this".

They basically already have this feature: https://support.mozilla.org/en-US/kb/use-link-previews-firef...

▲

ares623 17 hours ago | parent | prev | next [-]

Lots of imagining here.

▲

gigel82 21 hours ago | parent | prev | next [-]

For any mildly useful AI feature, there are hundreds of entirely dangerous ones. Either way I don't want the browser to have any AI features integrated, just like I don't want the OS to have them.

Especially since we know very well that they won't be locally running LLMs, everyone's plan is to siphon your data to their "cloud hybrid AI" to feed into the surveillance models (for ad personalization, and for selling to scammers, law enforcement and anyone else).

I'd prefer to have entirely separate and completely controlled and fire-walled solutions for any useful LLM scenarios.

▲

username223 16 hours ago | parent | prev | next [-]

> Imagine you have an AI button.

That pretty much sums up the problem: an "AI" button is about as useful to me as a "do stuff" button, or one of those red "that was easy" buttons they sell at Home Depot. Google translate has offered machine translation for 20+ years that is more or less adequate to understand text written in a language I don't read. Fine, add a button to do that. Mediocre page summaries? That can live in some submenu. "Agentic" things like booking flights for an upcoming trip? I would never trust an "AI" button to do that.

Machine learning can be useful for well-defined, low-consequence tasks. If you think an LLM is a robot butler, you're fundamentally misunderstanding what you're dealing with.

▲

invl 15 hours ago | parent | prev [-]

I have already clicked the all-caps button