Remix.run Logo
energy123 3 days ago

It's the best model pound for pound, but I find GPT 5.2 Thinking/Pro to be more useful for serious work when run with xhigh effort. I can get it to think for 20 minutes, but Gemini 3.0 Pro is like 2.5 minutes max. Obviously I lack full visibility because tok/s and token efficiency likely differs between them, but I take it as a proxy of how much compute they're giving us per inference, and it matches my subjective judgement of output quality. Maybe Google nerfs the reasoning effort in the Gemini subscription to save money and that's why I am experiencing this.

knowriju 3 days ago | parent | next [-]

When ChatGPT takes 20 minutes to reason, is it actually spending all the time burning tokens or does a bulk of the time go into 'scheduling' waits. If someone specifically selected xhigh reasoning, I am guessing it can be processed with high batch count.

3 days ago | parent [-]
[deleted]
cj 3 days ago | parent | prev [-]

I'm curious, what types of prompts are you running that benefit from 10+ minutes of think time?

Whats the quality difference between default ChatGPT and Thinking? Is it an extra 20% quality boost or is the difference night/day?

I've often imagined it would be great to have some kind of chrome extension or 3rd party tool to always run prompts in multiple thinking tiers so you can get an immediate response to read while you wait for the thinking models to think.

energy123 3 days ago | parent [-]

It's for planning system architecture when I want to get something good (along the criteria that I give it) rather than the first thing that runs.

I use Thinking and Pro. I don't use the default ChatGPT so can't comment on that. The difference between Thinking and Pro is modest but detectable. The 20 minute thinking times are with Pro, not with Thinking. But Pro only allows 60k tokens per prompt so I sometimes can't use it.

In the $200/month subscription they give you access to a "heavy thinking" tier for Thinking which increases test time compute by maybe 30% compared to what you get in Plus.

Version467 3 days ago | parent [-]

I recently bought into the $200 tier and was genuinely quite surprised at ChatGPT 5.2 Pros ability for software architecture planning. If you give it ~60k tokens of your codebase and a thorough description of what you actually want to happen then it comes up with very good ideas. The biggest difference to me is how thorough it is. This is already something I noticed with the codex high/xhigh models compared to gemini 3 pro and opus 4.5, but gpt pro is noticeably better still.

I guess it's not talked about as much because a lot fewer people have access to it, but after spending a bunch of time with gemini 3 and opus 4.5 I don't feel that openai has lost the lead at all. The benchmarks tell a different story, but for my real world use cases codex and gpt pro are still ahead. Better at sticking to my intent and fewer mistakes overall. It's slow, yes. But I can't write requirements as quickly as opus can misunderstand them anyway.