| ▲ | nl a day ago | |||||||
Gemma-4-E4B-it scored 15/25 on my https://sql-benchmark.nicklothian.com/#all-data (agentic SQL generation). The naming is a bit odd - E4B is "4.5B effective, 8B with embeddings", so despite the name it is probably best compared with the 8B/9B class models and is competitive with them. Qwen3.5-9B also scores 15/25 in thinking mode for example. The best 9B model I've found is Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 which gets to 17/25 gemma-4-E2B (4bit quant) scored 12/25, but is really a 5B model. That's the same as NVIDIA-Nemotron-3-Nano-4B which is the best 4B model I've found (yes, better than Qwen 4B). That's a great score for a small model. | ||||||||
| ▲ | neonstatic 11 hours ago | parent | next [-] | |||||||
Very happy to see updates to your benchmark. Looking forward to inclusion of larger Gemma 4 models! | ||||||||
| ||||||||
| ▲ | alecthomas a day ago | parent | prev | next [-] | |||||||
Oh this page is great! I just released AIM [1] which is a tool that generates verified SQL migrations using LLMs, and I tested a bunch of models manually. I think I'll just link to your page too! | ||||||||
| ▲ | GaggiX a day ago | parent | prev [-] | |||||||
>so despite the name it is probably best compared with the 8B/9B It runs much faster than a standard 8B/9B model, the name is given by the fact that it uses per-layer embedding (PLE). | ||||||||