Remix.run Logo
tedsanders 5 hours ago

Yeah, for a while ChatGPT Plus has been powered by two series of models under the hood.

One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.

The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).

We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.

(I work at OpenAI.)

vessenes 2 hours ago | parent | next [-]

By the way, I imagine you know this, but the product split is not obvious, even to my 20-something kids that are Plus subscribers - I saw one of them chatting with the instant model recently and I was like "No!! Never do that!!" and they did not understand they were getting the (I'm sorry to say) much less capable model.

I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.

pants2 2 hours ago | parent [-]

I agree -- we're on the ChatGPT Enterprise plan at work and every time someone complains about it screwing up a task it turns out they were using the instant model. There needs to be a way to disable it at the bare minimum.

an hour ago | parent [-]
[deleted]
lifis 4 hours ago | parent | prev | next [-]

You could perhaps show the "instant" reply right away and provide a button labeled "Think longer and give me a better answer" that starts the thinking model and eventually replaces the answer.

For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.

michaelmrose 3 hours ago | parent [-]

Wouldn't this be 1.5x as expensive?

jimbokun 2 hours ago | parent [-]

Not if the Instant answer is sufficient.

resters 2 hours ago | parent [-]

That's assuming that the instant answer is even directionally correct. A misleading instant answer could pollute the context and lead the thinking model astray.

ssl-3 an hour ago | parent [-]

Can the context of the pre-revision, Instant response be simply be discarded -- or forked or branched or [insert appropriate nomenclature here] -- instead of being included as potential poison?

(It seems absurd that to consider that there may be no undo button that the machine can push.)

bananaflag 30 minutes ago | parent | prev | next [-]

Before GPT-5 was launched, and after sama had said they would unify the ordinary and reasoning models, I think we all expected more than an (auto-)switcher, we expected some small innovation (smaller than the ordinary-to-reasoning one, but still a significant one) that would make both kinds of replies be in a way generated by a single model (don't know exactly how, I expected OpenAI to surprise us with something that would feel obvious in retrospect).

redox99 an hour ago | parent | prev | next [-]

Auto will never work, because for the exact same prompt sometimes you want a quick answer because it's not something very important to you, and sometimes you want the answer to be as accurate as possible, even if you have to wait 10 minutes.

In my case it would be more useful to have a slider of how much I'm willing to wait. For example instant, or think up to 1 minute, or think up to 15 minutes.

cj 37 minutes ago | parent [-]

They have an "answer now" button that stops the reasoning and starts the reply. Same with Gemini.

redox99 31 minutes ago | parent [-]

Yeah I use that, but it's not really a solution that allows to only have auto. It doesn't help when it chooses Instant instead of Thinking, and it's also much slower than using Instant outright because the Skip button doesn't immediately show, and it's generally slow to restart.

xiphias2 an hour ago | parent | prev | next [-]

Is there a way to get sticky model selection back, or the reason is that it is just too expensive to serve alternative models?

For coding I love codex-5.3-xhigh, but for non-coding prompts I still far prefer o3 even if it's considered a legacy model.

I can imagine that its higher tool use is too expensive to serve, but as a pro user I would love it to come back.

Flux159 3 hours ago | parent | prev | next [-]

Thanks for clarifying! I guess the default for most users is going to be to use the router / auto switcher which is fine since most people won't change the default.

Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).

For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.

I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.

lxgr 4 hours ago | parent | prev | next [-]

Thank you for confirming!

I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.

> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.

That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).

Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.

merlindru 2 hours ago | parent | prev | next [-]

but why not have "sane defaults but configurable"?

hide away the extra complexity for everyone. give power users a way to get it back.

dotancohen an hour ago | parent [-]

The model doesn't even need to be exposed in the UI. Let the user specify "use model foobar-4" or "use a coding model" or "use a middle-tier attorney model".

VIM does this well: no UI, magic incantations to use features.

mrcwinn 3 hours ago | parent | prev | next [-]

Do your fully autonomous offensive weapons and domestic surveillance systems use Instant?

Computer0 3 hours ago | parent [-]

Not today, but response time would be a lot better if they did.

seejayseesjays 4 hours ago | parent | prev [-]

Forgiveness but while you're here can you look into why the Notion connector in chat doesn't have the capability to write pages but the MCP (which I use via Codex) can? it looks like it's entirely possible, just mostly a missing action in the connector.

idiotsecant 4 hours ago | parent [-]

none granted.