| ▲ | roxolotl 15 hours ago | |
These studies are always really hard to judge the efficacy of. I would say though the most surprising thing to me about LLMs in the past year is how many people got hyped about the Opus 4.5 release. Having used Claude Code at work since it was released I haven't really noticed any step changes in improvement. Maybe that's because I've never tried to use it to one shot things? Regardless I'm more inclined to believe that 4.5 was the point that people started using it after having given up on copy/pasting output in 2024. If you're going from chat to agentic level of interaction it's going to feel like a leap. | ||
| ▲ | eterm 15 hours ago | parent | next [-] | |
I used it with Sonnet 4.0 a lot, and there was vastly more back-and-forth and correction of "dumb" things, such as forgetting to add "using" statements in C# files. I don't know if it's model, or harness improvements, or inbuilt-memory or all of the above, but it often has a step where it'll check itself that is done now before trying to build and getting an inevitable failure. Those small things add up to a much smoother and richer experience today compared to 6 months ago. | ||
| ▲ | tossandthrow 15 hours ago | parent | prev [-] | |
Nah, pre 4.5 it was not comfortable to use agentic coding. | ||