Remix.run Logo
xnx 4 hours ago

> increasing the number for such a minor change is not a move in the right direction

A .1 model number increase seems reasonable for more than doubling ARC-AGI 2 score and increasing so many other benchmarks.

What would you have named it?

Topfi 2 hours ago | parent [-]

My issue is that we haven't even gotten the release version of 3.0, that is also still in Preview, so may stick with 3.0 till that has been deemed stable.

Basically, what does the word "Preview" mean, if newer releases happen before a Preview model is stable? In prior Google models, Preview meant that there'd still be updates and improvements to said model prior to full deployment, something we saw with 2.5. Now, there is no meaning or reason for this designation to exist if they forgo a 3.0 still in Preview for model improvements.

xnx 2 hours ago | parent [-]

Given the pace AI is improving and that it doesn't give the exact same answers under many circumstances, is the the [in]stability of "preview" a concern?

GMail was in "beta" for 5 years.

verdverm 2 hours ago | parent [-]

ChatGPT 4.5 was never released to the public, but it is widely believed to be the foundation the 5.x series is built on.

Wonder how GP feels about the minor bumps for other model providers?