Remix.run Logo
andai 5 days ago

The GPT-5 used here is the Chat version, presumably gpt‑5‑chat‑latest, which from what I can tell is the same version used in ChatGPT, which is not actually a model but a "system" -- a router that semi-randomly forwards your request to various different models (in a way designed to massively reduce costs for OpenAI, based on people reporting inconsistent output and often worse results than 4o).

So from this it seems that not only would many of these requests not touch a reasoning model (or as it works now, have reasoning set to "minimal"?), but they're probably being routed to a mini or nano model?

It would make more sense, I think, to test on gpt-5 itself (and ideally the -mini and -nano as well), and perhaps with different reasoning effort, because that makes a big difference in many evals.

EDIT: Yeah the Chat router is busted big time. It fails to apply thinking even for problems that obviously call for it (analyzing financial reports). You have to add "Think hard." to the end of the prompt, or explicitly switch to the Thinking model in the UI.

kqr 5 days ago | parent | next [-]

This is correct, and was the reason I made sure to always append "Chat" to the end of "GPT-5". I should perhaps have been more clear about this. The reason I settled for the lesser router is I don't have access to the full GPT-5, which would have been a much better baseline, I agree.

andai 5 days ago | parent [-]

Do they require drivers license to use it? They asked for my ID for o3 Pro a few months ago.

kqr 5 days ago | parent [-]

That's the step at which I gave up, anyway.

varenc 5 days ago | parent | prev [-]

> Yeah the Chat router is busted big time... You have to add "Think hard." to the end of the prompt, or explicitly switch to the Thinking model in the UI.

I don't really get this gripe? It seems no different than before, except now it will sometimes opt into thinking harder by itself. If you know you want CoT reasoning you just select gpt5-thinking, no different than choosing o4-mini/o3 like before.

andai 4 days ago | parent [-]

Your options are now effectively 4o and o3 (GPT-5 and GPT-5 Thinking). There's no equivalent to o4-mini (at least not via ChatGPT -- I think you could emulate it through some combination of gpt-5 model size (-mini?) and reasoning effort).

o4-mini was the best reasoning model in my experience, because it gave you approximately o3 performance in a very small fraction of the time o3 would take.

ChatGPT's GPT-5 Thinking is even worse than o3, it's closer to o3 Pro! Some messages take 4 minutes to get a reply. Which is great when that's what you want, but the niche of "5 seconds to get an actually intelligent response" has been deleted for no good reason.