| ▲ | babl-yc 4 hours ago | ||||||||||||||||
Interesting that the 3.5 Flash launches before 3.5 Pro. Historically it's been the reverse for Gemini since Flash is distilled from Pro? Are they just training it a bit longer until it tops benchmarks? | |||||||||||||||||
| ▲ | londons_explore 3 hours ago | parent | next [-] | ||||||||||||||||
3.5 flash is presumably cheaper to run than pro too... Perhaps the company is compute constrained like everyone else is? | |||||||||||||||||
| |||||||||||||||||
| ▲ | kivle 2 hours ago | parent | prev | next [-] | ||||||||||||||||
It must have improved considerably since I tried the "3.5-flash-preview" a couple of months ago if all these claims in the presentations are true. Because it couldn't even make changes in a 200 line Python script without doing major mistakes (like messing up argument order when calling functions) when I tried it. | |||||||||||||||||
| ▲ | aykutseker 3 hours ago | parent | prev [-] | ||||||||||||||||
flash beating the pro it was distilled from is suspicious, not surprising.distillation usually loses you something. if the smaller model is winning on agentic evals, the more likely read is the evals weren't measuring agent quality in the first place. that's the bigger problem for builders, not which model to pick. | |||||||||||||||||
| |||||||||||||||||