I think it's useful to be realistic about what you can do with a local model, especially something as small as the 9B the author is using. A 9B model is around the level of Sonnet 3.6 - it can do autocomplete and small functions but it loses track trying to understand large problems.

But the are interesting and fun to play with! I do a LOT of work on local agent harnesses etc, mostly for fun.

My current project is a zero install agent: https://gemma-agent-explainer.nicklothian.com/ - Python, SQL and React all run completely in browser. Gemma E4B is recommended for the best experience!

This is under heavy development, needs Chrome for both HTML5 Filesystem API support and LiteRT (although most Chromium based browsers can be made to work with it)

It's different to most agents because it is zero install: the model runs in the browser using LiteRT/LiteLLM (which gives better performance than Transformers.js), and Filesystem API gives it optional sandbox access to a directory to read from.

It is self documenting - you can ask questions like "How is the system prompt used" in the live help pane and it has access to its own source code.

There's quite a lot there: press "Tour" to see it all.

Will be open source next week.

▲

furyofantares 3 hours ago | parent | next [-]

But I was doing a lot more than autocomplete and small functions with Sonnet 3.5.

▲

potatoman22 3 hours ago | parent | prev | next [-]

Not to be nitpicky, but many of the 4-12b models are somewhere between GPT-3.5 and GPT-4o-mini. It's hard to find a good comparison though, because the benchmarks people score models against change so often. For reference, Sonnet 3.6 came out about a year after GPT 3.5

	▲	nl 2 hours ago \| parent [-]
		Don't worry about being nitpicky! I'm going to out-nitpick you.... Actually.... I write and publish my own benchmark for this stuff. It's an agentic SQL benchmark which isn't in the training data yet and I've found can separate frontier models from close-followers (the only models to get 100% are Opus 4.6 and GPT 5.5). The best small model I've found is a fine-tune of Opus-3.5 9B which scores 18/25: https://sql-benchmark.nicklothian.com/?highlight=Jackrong_Qw... Haiku 4.5 scores 20/25, and Haiku is certainly better than Sonnet 3.6. GPT 3.5 scores 13/25.

▲

ai_fry_ur_brain 7 hours ago | parent | prev [-]

[flagged]

▲

nl 7 hours ago | parent [-]

I think knowledge is power.

I think that the more people who try local models (especially the larger ones) the better.

I sometimes get the impression that many people claiming that local models are as good as frontier models work in "token poor" environments. If you can't build large-scale programs using at least Opus 4.5+ then it's difficult to compare. They compare something like Qwen 27B with Sonnet and see that it is nearly as good, but miss that the frontier models are a lot better.

That knowledge is power, too.

I personally can help making local models more accessible. I can't make Opus cheaper.

▲

bachmeier 7 hours ago | parent [-]

> I sometimes get the impression that many people claiming that local models are as good as frontier models work in "token poor" environments. If you can't build large-scale programs using at least Opus 4.5+ then it's difficult to compare.

I sometimes get the impression that people posting comments on HN don't realize that LLMs do more than vibe coding.

▲

BubbleRings 6 hours ago | parent [-]

Yeah no kidding. For instance, if you are an independent inventor trying to write a patent while keeping your patent lawyer expenses to a minimum, you want to write as much of the first draft(s) of the patent as you can yourself. (You’ll save billable hours with your patent lawyer, and you’ll end up with a better patent because you’ll communicate your innovations more clearly to your lawyer.)

However, and this is the big thing, you absolutely do not want to be asking a SOTA LLM for help with the language in your patent application. This is because describing your invention to a web based LLM could be considered a public “disclosure” of your invention, which, (after a one year grace period goes by), could put your invention in the public domain, basically—and thereby prevent you (or anyone else) from being able to ever patent the invention. Plus, you know, a random unscrupulous employer at the SOTA company could be reviewing logs and notice your great idea, and file a patent on it before you do, and remember, the United States patent office went to “first to file” in 2013.

Oh and don’t take legal advice from random people in the internet by the way.

	▲	solenoid0937 6 hours ago \| parent [-]
		> This is because describing your invention to a web based LLM could be considered a public “disclosure” of your invention, which, (after a one year grace period goes by), could put your invention in the public domain, basically—and thereby prevent you (or anyone else) from being able to ever patent the invention. This is simply not true. Even if it were true (and again, it's not) you could simply use zero data retention APIs. No one at the big model companies is trawling through your chats to steal your patents. It's not only illegal and against their own terms of service, but these people have better uses of their time.