| ▲ | zhyder 4 hours ago | |||||||
Surprisingly big jump in ARC-AGI-2 from 31% to 77%, guess there's some RLHF focused on the benchmark given it was previously far behind the competition and is now ahead. Apart from that, the usual predictable gains in coding. Still is a great sweet-spot for performance, speed and cost. Need to hack Claude Code to use their agentic logic+prompts but use Gemini models. I wish Google also updated Flash-lite to 3.0+, would like to use that for the Explore subagent (which Claude Code uses Haiku for). These subagents seem to be Claude Code's strength over Gemini CLI, which still has them only in experimental mode and doesn't have read-only ones like Explore. | ||||||||
| ▲ | WarmWash 4 hours ago | parent [-] | |||||||
>I wish Google also updated Flash-lite to 3.0+ I hope every day that they have made gains on their diffusion model. As a sub agent it would be insane, as it's compute light and cranks 1000+ tk/s | ||||||||
| ||||||||