| ▲ | gck1 3 hours ago | |
I gave a fairly complex reverse engineering task to DS-4 xhigh and GPT-5.5 xhigh today. After about 6 hours, both ultimately failed to fully RE, however, there were some drastic differences: DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn't complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure. GPT-5.5, on the other hand, blew me away. It just did the right things, didn't jump to next steps until it was sure it completed the initial layers and had a full understanding of what's required. The only time I prompted it during the 6 hours was when I saw it going in the right direction and I could nudge it slightly towards an even better way. I never felt I was fighting it. Okay, maybe a little bit - after compaction, it sometimes would go on a "no I'm not helping you with reverse engineering" tangent, but it would resolve in a clean session. I cancelled my Claude subscription a month ago, so I haven't tested that, but DeepSeek has reminded me a lot of how I worked with Opus 4.6/4.7. Which perhaps could be a positive sign to some, but GPT-5.5 showed me that the way claude/ds work is just way too annoying. | ||
| ▲ | ttul 2 hours ago | parent | next [-] | |
What you’re experiencing is the difference in model intelligence. Most models can seem pretty good at simple stuff over short time horizons. Complex work requires that more intelligence be stuffed into those trillion-dimensional spaces. | ||
| ▲ | cmrdporcupine 31 minutes ago | parent | prev [-] | |
The GPT models are heavily biased to a more incremental, empirical, evidence based approach. Sometimes to a fault. I prefer them for this reason, but it requires coaxing or strategic use of /goal to break it out if its highly staged, one piece at a time, approach.. if you don't like it. I suspect for people doing more... website ... type development, the more "yeet this into existence" style of Opus feels preferable. With Claude I was constantly jamming my finger on the escape key "wait, you did what?! based on what proof?!" | ||