Remix.run Logo
minimaxir 4 hours ago

Believe me, I wish it was just parroting.

The real annoying thing about Opus 4.5 is that it's impossible to publicly say "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth, to my personal frustration.

I have been trying to break this damn model since its November release by giving it complex and seemingly impossible coding tasks but this asshole keeps doing them correctly. GPT-5.3-Codex has been the same relative to GPT-5.2-Codex, which just makes me even more frustrated.

Denzel 40 minutes ago | parent | next [-]

Weird, I broke Opus 4.5 pretty easily by giving some code, a build system, and integration tests that demonstrate the bug.

CC confidently iterated until it discovered the issue. CC confidently communicated exactly what the bug was, a detailed step-by-step deep dive into all the sections of the code that contributed to it. CC confidently suggested a fix that it then implemented. CC declared victory after 10 minutes!

The bug was still there.

I’m willing to admit I might be “holding it wrong”. I’ve had some successes and failures.

It’s all very impressive, but I still have yet to see how people are consistently getting CC to work for hours on end to produce good work. That still feels far fetched to me.

viking123 3 hours ago | parent | prev [-]

It still cannot solve a synchronization issue in my fairly simple online game, completely wrong analysis back to back and solutions that actually make the problem worse. Most training data is probably react slop so it struggles with this type of stuff.

But I have to give it to Amodei and his goons in the media, their marketing is top notch. Fear-mongering targeted to normies about the model knowing it is being evaluated and other sort of preaching to the developers.