Remix.run Logo
drcongo 3 days ago

As someone who doesn't use anything OpenAI (for all the reasons), I have to agree with the GP. It's all baffling. Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

Once you get to this point you're putting the paradox of choice on the user - I used to use a particular brand toothpaste for years until it got to the point where I'd be in the supermarket looking at a wall of toothpaste all by the same brand with no discernible difference between the products. Why is one of them called "whitening"? Do the others not do that? Why is this one called "complete" and that one called "complete ultra"? That would suggest that the "complete" one wasn't actually complete. I stopped using that brand of toothpaste as it become impossible to know which was the right product within the brand.

If I was assessing the AI landscape today, where the leading models are largely indistinguishable in day to day use, I'd look at OpenAI's wall of toothpaste and immediately discount them.

tedsanders 3 days ago | parent | next [-]

(I work at OpenAI.)

In ChatGPT, o4-mini is replacing o3-mini. It's a straight 1-to-1 upgrade.

In the API, o4-mini is a new model option. We continue to support o3-mini so that anyone who built a product atop o3-mini can continue to get stable behavior. By offering both, developers can test both and switch when they like. The alternative would be to risk breaking production apps whenever we launch a new model and shut off developers without warning.

I don't think it's too different from what other companies do. Like, consider Apple. They support dozens of iPhone models with their software updates and developer docs. And if you're an app developer, you probably want to be aware of all those models and docs as you develop your app (not an exact analogy). But if you're a regular person and you go into an Apple store, you only see a few options, which you can personalize to what you want.

If you have concrete suggestions on how we can improve our naming or our product offering, happy to consider them. Genuinely trying to do the best we can, and we'll clean some things up later this year.

Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our unified naming scheme originally made room for 999 versions, but we didn't make it past 3.

daveguy 3 days ago | parent | next [-]

Have any of the models been deprecated? It seems like a deprecation plan and definition of timelines would be extraordinarily helpful.

I have not seen any sort of "If you're using X.122, upgrade to X.123, before 202X. If you're using X.120, upgrade to anything before April 2026, because the model will no longer be available on that date." ... Like all operating systems and hardware manufacturers have been doing for decades.

Side note, it's amusing that stable behavior is only available on a particular model with a sufficiently low temperature setting. As near-AGI shouldn't these models be smart enough to maintain consistency or improvement from version to version?

tedsanders 3 days ago | parent [-]

Yep, we have a page of announced API deprecations here: https://platform.openai.com/docs/deprecations

It's got all deprecations, ordered by date of announcement, alongside shutdown dates and recommended replacements.

Note that we use the term deprecated to mean slated for shutdown, and shutdown to mean when it's actually shut down.

In general, we try to minimize developer pain by supporting models for as long as we reasonably can, and we'll give a long heads up before any shutdown. (GPT-4.5-preview was a bit of an odd case because it was launched as a potentially temporary preview, so we only gave a 3-month notice. But generally we aim for much longer notice.)

meander_water 3 days ago | parent [-]

On that page I don't see any mention of o3-mini. Is o3-mini a legacy model now which is slated to be deprecated later on?

tedsanders 3 days ago | parent [-]

Nothing announced yet.

Our hypothesis is that o4-mini is a much better model, but we'll wait to hear feedback from developers. Evals only tell part of the story, and we wouldn't want to prematurely deprecate a model that developers continue to find value in. Model behavior is extremely high dimensional, and it's impossible to prevent regression on 100% use cases/prompts, especially if those prompts were originally tuned to the quirks of the older model. But if the majority of developers migrate happily, then it may make sense to deprecate at some future point.

We generally want to give developers as stable as an experience as possible, and not force them to swap models every few months whether they want to or not. Personally, I want developers to spend >99% of their time thinking about their business and <1% of their time thinking about what the OpenAI API is requiring of them.

dmd 3 days ago | parent | prev [-]

Any idea when v1/models will be updated? As of right now, https://api.openai.com/v1/models has "id": "o3-mini-2025-01-31" and "id": "o3-mini", but no just 'o3'.

tedsanders 3 days ago | parent [-]

Ah, I know this is a pain, but by default o3 is only available to developers on tiers 4–5.

If you're in tiers 1–3, you can still get access - you just need to verify your org with us here:

https://help.openai.com/en/articles/10910291-api-organizatio...

I recognize that verification is annoying, but we eventually had to resort to this as otherwise bad actors will create zillions of accounts to violate our policies and/or avoid paying via credit card fraud/etc.

dmd 3 days ago | parent [-]

Aha! Verified and now I see o3. Thanks.

petesergeant 3 days ago | parent | prev | next [-]

> Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

Because if they removed access to o3-mini — which I have tested, costed, and built around — I would be very angry. I will probably switch to o4-mini when the time is right.

TuxSH 3 days ago | parent [-]

They just did that, at least for chat

petesergeant 2 days ago | parent [-]

It seems clear to me I would have built an app around the API, not the chat window.

mkozlows 3 days ago | parent | prev | next [-]

They keep a lot of models around for backward compatibility for API users. This is confusing, but not inherently a bad idea.

louthy 3 days ago | parent | prev [-]

You could develop an AI model to help pick the correct AI model.

Now you’ve got 18 problems.

skygazer 3 days ago | parent [-]

I think you're trying to re-contextualize the old Standards joke, but I actually think you're right -- if a front end model could dispatch as appropriate to the best backend model for a given prompt, and turn everything into a high level sort of mixture of models, I think that would be great, and a great simplifying step. Then they can specialize and optimize all they want, CPU goes down, responses get better and we only see one interface.

louthy 3 days ago | parent | next [-]

> I think you're trying to re-contextualize the old Standards joke

Regex joke [1], but the standards joke will do just fine also :)

[1] Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

calmoo 3 days ago | parent | prev [-]

Isn't this basically the idea of agents?