| ▲ | vessenes 3 hours ago | |
Interesting they get to rev this with the release of a new flash model. I'm speculating part of the distil pipeline includes the image gen stuff; that seems like internal tooling that will pay dividends over time, if true. New frontier model -> automatic new image model. Even if it's just incremental updates, it's good for both the product cadence and compounding improvements. | ||
| ▲ | WarmWash 3 hours ago | parent | next [-] | |
The confusion here is dense, 3.1 Flash Image is not 3.1 Flash. The banana models (image) are a different than the mainline models, but the confusingly leverage the same naming scheme. | ||
| ▲ | NitpickLawyer 3 hours ago | parent | prev [-] | |
> the distil pipeline I don't have inside info, but everything we've seen about gemini3.0 makes me think they aren't doing distillation for their models. They are likely training different arch/sizes in parallel. Gemini 3.0-flash was better than 3.0-pro on a bunch of tasks. That shouldn't happen with distillation. So my guess is that they are working in parallel, on different arches, and try out stuff on -flash first (since they're smaller and faster to train) and then apply the learnings to -pro training runs. (same thing kinda happened with 2.5-flash that got better upgrades than 2.5-pro at various points last year). Ofc I might be wrong, but that's my guess right now. | ||