Remix.run Logo
nl a day ago

Gemma-4-E4B-it scored 15/25 on my https://sql-benchmark.nicklothian.com/#all-data (agentic SQL generation).

The naming is a bit odd - E4B is "4.5B effective, 8B with embeddings", so despite the name it is probably best compared with the 8B/9B class models and is competitive with them.

Qwen3.5-9B also scores 15/25 in thinking mode for example. The best 9B model I've found is Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 which gets to 17/25

gemma-4-E2B (4bit quant) scored 12/25, but is really a 5B model. That's the same as NVIDIA-Nemotron-3-Nano-4B which is the best 4B model I've found (yes, better than Qwen 4B).

That's a great score for a small model.

neonstatic 11 hours ago | parent | next [-]

Very happy to see updates to your benchmark. Looking forward to inclusion of larger Gemma 4 models!

nl 4 hours ago | parent [-]

The medium one on OpenRouter didn't support tools when I tried it. I will update when there is one.

alecthomas a day ago | parent | prev | next [-]

Oh this page is great! I just released AIM [1] which is a tool that generates verified SQL migrations using LLMs, and I tested a bunch of models manually. I think I'll just link to your page too!

[1] https://github.com/alecthomas/aim

GaggiX a day ago | parent | prev [-]

>so despite the name it is probably best compared with the 8B/9B

It runs much faster than a standard 8B/9B model, the name is given by the fact that it uses per-layer embedding (PLE).