| ▲ | coldtea 4 hours ago | |
>But, then Gemma 4 proved to be extraordinarily good for its size (better than Qwen), and kinda disproved that US models are any weaker at small sizes. Did it "disprove" it retroactively or just changed what the situation is, given that until then they were indeed weaker at small sizes? | ||
| ▲ | SwellJoe 4 hours ago | parent [-] | |
I don't know. I think it proves that if Google is baking guardrails into their models that prevent them from finding security bugs, they didn't bake those guardrails into Gemma 4, because it is very good at it. Maybe that means Google devs had a change of heart. Maybe it means something about Gemma 4 architecture is better for this task than Gemini 3.1 Pro. Gemini Flash 3.5 did OK though. Anyway, I kinda think among US models only Fable really tries to block security work like this, based on my experience so far. | ||