▲ | yberreby 6 days ago | |||||||||||||||||||||||||
> Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases. The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o. One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder. | ||||||||||||||||||||||||||
▲ | Tiberium 6 days ago | parent | next [-] | |||||||||||||||||||||||||
There's a HUGE difference that you are not mentioning: there are "gpt-4o" and "chatgpt-4o-latest" on the API. The former is the stable version (there are a few snapshot but the newest snapshot has been there for a while), and the latter is the fine-tuned version that they often update on ChatGPT. All those benchmarks were done for the API stable version of GPT-4o, since that's what businesses rely on, not on "chatgpt-4o-latest". | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | themanmaran 6 days ago | parent | prev [-] | |||||||||||||||||||||||||
I disagree. From the average user perspective, it's quite confusing to see half a dozen models to choose from in the UI. In an ideal world, ChatGPT would just abstract away the decision. So I don't need to be an expert in the relatively minor differences between each model to have a good experience. Vs in the API, I want to have very strict versioning of the models I'm using. And so letting me run by own evals and pick the model that works best. | ||||||||||||||||||||||||||
|