Remix.run Logo
podnami 7 hours ago

They lost me at Opus 4.7

Anecdotally OpenAI is trying to get into our enterprise tooth and nail, and have offered unlimited tokens until summer.

Gave GPT5.4 a try because of this and honestly I don’t know if we are getting some extra treatment, but running it at extra high effort the last 30 days I’ve barely see it make any mistakes.

At some points even the reasoning traces brought a smile to my face as it preemptively followed things that I had forgotten to instruct it about but were critical to get a specific part of our data integrity 100% correct.

dsco 7 hours ago | parent | next [-]

Same here. I feel like all of these shenanigans could be because Anthropic are compute constrained, forcing then to take reckless risks around reducing it.

beering 4 hours ago | parent | prev | next [-]

GPT-5.4 was already better than Opus 4.6 on a lot of areas, especially correctness and tricky logic. I’m eager to see if 5.5 is even better.

cube2222 7 hours ago | parent | prev | next [-]

I’ve never been one to complain about new models, and also didn’t experience most of the issues folks were citing about Claude Code over the last couple months. I’ve been using it since release, happy with almost each new update.

Until Opus 4.7 - this is the first time I rolled back to a previous model.

Personality-wise it’s the worst of AI, “it’s not x, it’s y”, strong short sentences, in general a bulshitty vibe, also gaslighting me that it fixed something even though it didn’t actually check.

I’m not sure what’s up, maybe it’s tuned for harnesses like Claude Design (which is great btw) where there’s an independent judge to check it, but for now, Opus 4.6 it is.

vorticalbox 7 hours ago | parent | prev | next [-]

extra high burns tokens i find. ( run 5.4 on medium for 90% of the tasks and high if i see medium struggling and its very focused and make minimum changes.

dsco 7 hours ago | parent | next [-]

Yeah but it also then strikes the perfect balance between being meticulous and pragmatic. Also it pushes back much more often than other models in that mode.

DANmode 6 hours ago | parent | prev [-]

Rework burns tokens.

someguyiguess 4 hours ago | parent | prev | next [-]

I went back to 4.5. No regrets and it’s a bit cheaper.

SkyPuncher 3 hours ago | parent [-]

Same here. 4.6 was a downgrade in thinking quality, but I appreciated the extend context at first.

Over time, I realized the extended context became randomly unreliable. That was worse to me than having to compact and know where I was picking up.

robeym 6 hours ago | parent | prev | next [-]

What's your workflow like? I'd be curious to test OpenAI out again but Claude Code is how I use the models. Does it require relearning another workflow?

beering 4 hours ago | parent [-]

Isn’t it bascially the same thing? You type what you want into the input box and it does what you ask for.

epsteingpt an hour ago | parent | prev | next [-]

Truth

enraged_camel 7 hours ago | parent | prev [-]

I find that it is better at thinking broadly and at a high level, on tasks that are tangential to coding like UX flows, product management and planning of complex implementations. I have yet to see it perform better than either Opus 4.6 or 4.7 though.