| ▲ | antirez 3 hours ago | |||||||||||||||||||||||||||||||||||||
Very good move. In my experience, for system programming at least, GPT 5.4 xhigh is vastly superior to Claude Opus 4.6 max effort. I ran many brutal tests, including reconstructing for QEMU the SCSI controller (not longer accessible) of a SVSY UNIX of the early 90s used in a 386. Side by side, always re-mirroring the source trees each time one did a breakthrough in the implementation. Well, GPT 5.4 single handed did it all, while Opus continued to take wrong paths. The same for my Redis bug tracking and development. But 200$ is too much for many people (right now, at least: the reality is that if frontier LLMs are not democratized, we will end paying like a house rent to a few providers), and also while GPT 5.4 is much stronger, it is slower and less sharp when the thing to do is simple, so many people went for Claude (also because of better marketing and ethical concerns, even if my POV is different on that side: both companies sell LLM models with similar capabilities and similar internal IP protection and so forth, to me they look very similar in practical terms). This will surely change things, and many people will end with a Claude 5x account + a Codex 5x account I bet. | ||||||||||||||||||||||||||||||||||||||
| ▲ | dweekly 2 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
GPT 5.4 is the surly physics PhD post-doc who slowly and angrily sits in a basement to write brilliant, undocumented, uncommented code that encapsulates a breakthrough algorithm. Opus 4.6 is the L5 new hire SWE keen to prove their chops and quickly turn out totally reasonable code with putatively defensible reasons for doing it that way (that are sometimes tragically wrong) and then catch an after-work yoga class with you. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | Tiberium 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Thanks for confirming my impressions, it's been like 4 months now that I've arrived at the same conclusions. GPT models are just better at any kind of low-level work: reverse engineering including understanding what the decompiled code/assembly does, renaming that decompiled code (functions/types), any kind of C/C++, way more reliable security research (Opus will find way more, but most will turn out to be false positives). I've had GPT create non-trivial custom decompilers for me for binaries built with specific compilers (it's a much simpler task than what IDA Pro/Ghidra are doing but still complex), and modify existing Java decompilers. Regarding speed, I don't use xhigh that often, and surprisingly for me GPT 5.4 high is faster than Claude 4.6 Opus high (unless you enable fast mode for Opus). Of course I still use Opus for frontend, for some small scripts, and for criticizing GPT's code style, especially in Python (getattr). | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | Asyne 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
+1 to this, I've found GPT/Codex models consistently stronger in engineering tasks (such as debugging complex, cross-systems issues, concurrency problems, etc). I use both OpenAI and Anthropic models, though for different purposes, what surprises me is how underrated GPT still feels (or, alternatively, how overhyped Anthropic models can be) given how capable it is in these scenarios. There also seems to be relatively little recognition of this in the broader community (like your recent YouTube video). My guess is that demand skews toward general codegen rather than the kind of deep debugging and systems work where these differences really show. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | thisisit 40 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
My non scientific tests has been that GPT models follow the prompts literally. Every time I give it an example, it uses the example in literal sense instead of using it to enhance its understanding of the ask. This is a good thing if I want it to follow instructions but bad if I want it to be creative. I have to tell it that the examples I gave are just examples and not to be used in output. I feel comfortable using it when I have everything mapped out. Claude on the other hand can be creative. It understands that examples are for reference purposes only. But there are times it decides to off on a tangent on its own and decide not to follow instructions closely. I find it useful for bouncing off ideas or test something new, The other thing I notice is Claude has slightly better UI design sensibilities even if you don’t give instructions. GPT on the other hand needs instructions otherwise every UI element will be so huge you need to double scroll to find buttons. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | osti an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Yup I've mentioned this in another thread, I got gpt 5.4xhigh to improve the throughout of a very complex non typical CUDA kernel by 20x. This was through a combination of architecture changes and then do low level optimizations, it did the profiling all by itself. I was extremely impressed. | ||||||||||||||||||||||||||||||||||||||
| ▲ | postalcoder 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
What I like most about gpt coding models is how predictable of a lever that thinking effort is. Xhigh will gather all the necessary context. low gathers the minimum necessary context. That doesn’t work as well with me for Opus. Even at max effort it’ll overlook files necessary to understanding implementations. It’s really annoying when you point that out and you get hit with an”you’re absolutely right”. Codex isn’t the greatest one shot horse in the race but, once you figure out how to harness it, it’s hard to go back to other models. | ||||||||||||||||||||||||||||||||||||||
| ▲ | SunshineTheCat 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
1000%. I have been running claude's work through codex for about a week now and it's insane the number of mistakes it catches. Not really sure why I've been doing this, just interesting to watch I guess. Not to mention a billion times more usage than you get with claude, dollar for dollar. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | bob1029 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
GPT5.4 with any effort level is scary when you combine it with tricks like symbolic recursion. I actually had to reduce the effort level to get the model to stop trying to one shot everything. I struggled to come up with BS test cases it couldn't dunk in some clever way. Turning down the reasoning effort made it explore the space better. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | sho_hn 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Same for me, cf. https://news.ycombinator.com/item?id=47680123 | ||||||||||||||||||||||||||||||||||||||
| ▲ | zozbot234 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
The $100/mo giving access to GPT Pro (with reduced usage) is a nice counter to the just teased Claude Mythos. But GPT 5.4 xhigh being able to perform that kind of low-level reconstruction task is very impressive already. | ||||||||||||||||||||||||||||||||||||||
| ▲ | aerhardt 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
I completely agree with you on both the technical and ethical reasoning. Thank you for speaking out. I think it's important that reputable engineers like you do so. The Claude gang gaslighting is unhinged right now. It would be none of my concern but I have to deal with it in the real world - my customers are susceptible to these memes. I'm sure others have to deal with similar IRL consequences, too. | ||||||||||||||||||||||||||||||||||||||