Remix.run Logo
pmarreck 9 hours ago

Impressive model, for sure. I've been running it on my Mac, now I get to have it locally in my iPhone? I need to test this. Wait, it does agent skills and mobile actions, all local to the phone? Whaaaat? (Have to check out later! Anyone have any tips yet?)

I don't normally do the whole "abliterated" thing (dealignment) but after discovering https://github.com/p-e-w/heretic , I was too tempted to try it with this model a couple days ago (made a repo to make it easier, actually) https://github.com/pmarreck/gemma4-heretical and... Wow. It worked. And... Not having a built-in nanny is fun!

It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.)

Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations.

I specifically like dealigned local models because if I have to get my thoughts policed when playing in someone else's playground, like hell am I going to be judged while messing around in my own local open-source one too. And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

Note: I tried to hook this one up to OpenClaw and ran into issues

To answer the obvious question- Yes, this sort of thing enables bad actors more (as do many other tools). Fortunately, there are far more good actors out there, and bad actors don't listen to rules that good actors subject themselves to, anyway.

c2k 8 hours ago | parent | next [-]

I run mlx models with omlx[1] on my mac and it works really well.

[1] https://github.com/jundot/omlx

pmarreck 3 hours ago | parent [-]

Holy hell, how new is this? I've never heard of it, looks great!

nothinkjustai 3 hours ago | parent [-]

It’s completely vibe coded, doesn’t even run on my Mac lol

barbazoo 8 hours ago | parent | prev | next [-]

> And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

I checked the abliterate script and I don't yet understand what it does or what the result is. What are the conversations this enables?

SL61 6 hours ago | parent | next [-]

LLMs are very helpful for transcribing handwritten historical documents, but sometimes those documents contain language/ideas that a perfectly aligned LLM will refuse to output. Sometimes as a hard refusal, sometimes (even worse) by subtly cleaning up the language.

In my experience the latest batch of models are a lot better at transcribing the text verbatim without moralizing about it (i.e. at "understanding" that they're fulfilling a neutral role as a transcriber), but it was a really big issue in the GPT-3/4 era.

dolebirchwood 5 hours ago | parent [-]

I have a project where I'm using LLMs to parse data from PDFs with a very complicated tabular layout. I've been using the latest Gemini models (flash and pro) for their strong visual reasoning, and they've generally been doing a really good job at it.

My prompt states that their job is to extract the text exactly as it appears in the PDF. One data point to be extracted is the race of each person listed. In one case, someone's race was "Indian". Gemini decided to extract it as "Native American". So ridiculous.

janalsncm 5 hours ago | parent | next [-]

According to Gemini, Native America is the most populous country.

devmor 4 hours ago | parent | prev [-]

I was attempting to help someone who runs a small shop selling restored clothing set up a gemini pipeline that would restage images she took of clothing items with bad lighting, backgrounds, etc.

Basically anything that showed any “skin” on a mannequin it would refuse to interact with. Even just a top, unless she put pants on the mannequin.

It was infuriating.

spijdar 8 hours ago | parent | prev | next [-]

Realistically, a lot of people do this for porn.

In my experience, though, it's necessary to do anything security related. Interestingly, the big models have fewer refusals for me when I ask e.g. "in <X> situation, how do you exploit <Y>?", but local models will frequently flat out refuse, unless the model has been abliterated.

tredre3 6 hours ago | parent [-]

From what I've seen gemma 4 doesn't refuse a lot regarding sex, it only needs little nudging in the right direction sometimes.

But it does refuse being critical of the usual topics: israel, islam, trans, or race.

So wanting to discuss one of those is the real reason people would use an uncensored model.

throwuxiytayq 8 hours ago | parent | prev | next [-]

The in-ter-net is for porn

rav3ndust 7 hours ago | parent [-]

that song is going to be stuck in my head all day now. lol

golem14 2 hours ago | parent [-]

That whole musical is just fantastic!

pmarreck 7 hours ago | parent | prev [-]

1) Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

2) Asking questions about sketchy things. Simply asking should not be censored.

3) I don't use it for this, but porn or foul language.

4) Imitating or representing a public figure is often blocked.

5) Asking security-related questions when you are trying to do security.

6) For those who have had it, people who are trying to use AI to deal with traumatic experiences that are illegal to even describe.

Many other instances.

tshaddox 3 hours ago | parent | next [-]

> Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

When’s the last time you tried this? ChatGPT and Gemini have no trouble responding with all the common criticisms of Islam.

peyton 6 hours ago | parent | prev [-]

The manufacturing of biologics can be heavily censored to an absurd degree. I don’t know about Gemma 4 in particular.

pmarreck 4 hours ago | parent [-]

Really? That's fascinating. Why is that?

eloisant 7 hours ago | parent | prev | next [-]

I tried it on my mac, for coding, and I wasn't really impressed compared to Qwen.

I guess there are things it's better at?

nkohari 6 hours ago | parent [-]

You're comparing apples to oranges there. Qwen 3.5 is a much larger model at 397B parameters vs. Gemma's 31B. Gemma will be better at answering simple questions and doing basic automation, and codegen won't be it's strong suit.

kgeist 6 hours ago | parent | next [-]

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

Mil0dV 6 hours ago | parent [-]

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

tredre3 6 hours ago | parent | prev [-]

Gemma 4 31B is still not impressive at coding compare to even Qwen 3.5 27B. It's just not its strong suit.

So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.

gigatexal 6 hours ago | parent [-]

This has been my experience as well, Qwen via Ollama locally has been very very impressive.

magospietato 8 hours ago | parent | prev | next [-]

Haven't built anything on the agent skills platform yet, but it's pretty cool imo.

On Android the sandbox loads an index.html into a WebView, with standardized string I/O to the harness via some window properties. You can even return a rendered HTML page.

Definitely hacked together, but feels like an indication of what an edge compute agentic sandbox might look like in future.

bossyTeacher 6 hours ago | parent | prev | next [-]

>there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

Mind giving us a few of the examples that you plan to run in your local LLM? I am curious.

pmarreck 4 hours ago | parent [-]

I'm not sure what you're angling at but I already gave a set of questions that are ethically legitimate yet routinely censored by the public models:

https://news.ycombinator.com/item?id=47654013

Not to mention that doing what the big model makers do literally dumbs the model down.

They should at least allow something like letting you prove your age and identity to give you access to better/unaligned models, maybe even requiring a license of some sort. Because you know what? SOMEONE in there absolutely has access to the completely uncensored versions of the latest models.

satvikpendem 3 hours ago | parent [-]

I tried 1 and a few others with hypothetical situations, public models answer perfectly fine it looks like.

3yr-i-frew-up 3 hours ago | parent | prev | next [-]

[dead]

jackp96 8 hours ago | parent | prev [-]

[flagged]

potsandpans 8 hours ago | parent [-]

I'm tired of this concern trolling.

8 hours ago | parent | next [-]
[deleted]
7 hours ago | parent | prev | next [-]
[deleted]
jackp96 8 hours ago | parent | prev [-]

[flagged]