▲ | andai 5 days ago | ||||||||||||||||
The GPT-5 used here is the Chat version, presumably gpt‑5‑chat‑latest, which from what I can tell is the same version used in ChatGPT, which is not actually a model but a "system" -- a router that semi-randomly forwards your request to various different models (in a way designed to massively reduce costs for OpenAI, based on people reporting inconsistent output and often worse results than 4o). So from this it seems that not only would many of these requests not touch a reasoning model (or as it works now, have reasoning set to "minimal"?), but they're probably being routed to a mini or nano model? It would make more sense, I think, to test on gpt-5 itself (and ideally the -mini and -nano as well), and perhaps with different reasoning effort, because that makes a big difference in many evals. EDIT: Yeah the Chat router is busted big time. It fails to apply thinking even for problems that obviously call for it (analyzing financial reports). You have to add "Think hard." to the end of the prompt, or explicitly switch to the Thinking model in the UI. | |||||||||||||||||
▲ | kqr 5 days ago | parent | next [-] | ||||||||||||||||
This is correct, and was the reason I made sure to always append "Chat" to the end of "GPT-5". I should perhaps have been more clear about this. The reason I settled for the lesser router is I don't have access to the full GPT-5, which would have been a much better baseline, I agree. | |||||||||||||||||
| |||||||||||||||||
▲ | varenc 5 days ago | parent | prev [-] | ||||||||||||||||
> Yeah the Chat router is busted big time... You have to add "Think hard." to the end of the prompt, or explicitly switch to the Thinking model in the UI. I don't really get this gripe? It seems no different than before, except now it will sometimes opt into thinking harder by itself. If you know you want CoT reasoning you just select gpt5-thinking, no different than choosing o4-mini/o3 like before. | |||||||||||||||||
|