I am pretty convinced that for most types of day to day work, any perceived improvements from the latest Claude models for example were total placebo. In blind tests and with normal tasks, people would probably have no idea if they're using Opus 4.5 or 4.6.

▲

sumeno 3 hours ago | parent | next [-]

This has basically been my experience since Sonnet 3.5. I've been working on a personal project on and off with various models and things since then and the biggest difference between then and now is that it will do larger chunks of work than it did before, but the quality of the code is not particularly better, I still have to do a lot of cleanup and it still goes off the rails pretty frequently. I have to do fewer individual prompts, but the time spent reviewing the code takes longer because I also have to mentally process and fix larger chunks of code too

Is it a better user experience now? Yes. Has it boosted my productivity on this project? Absolutely.

But it still needs a ton of hand holding for anything complicated and I still deal with tons of "OK, this bug is fixed now!" followed by manually confirming a bug still exists.

▲

BoumTAC 13 hours ago | parent | prev | next [-]

It's because they are getting so good it's impossible to recognize them.

Haiku 4.5 is already so good it's ok for 80% (95%?) of dev tasks.

▲

FuckButtons 4 hours ago | parent | next [-]

I must be writing very different software than you, I keep opus on a tight leash and it still comes to the strangest conclusions.

	▲	lukan 3 hours ago \| parent [-]
		Very possible. Some things work like a charm on first try for me, others you can spell it out again and again. And then yet again. Something to do with training data, obviously.

▲

Bolwin 4 hours ago | parent | prev [-]

I've found Haiku to be truly mediocre for working with. If you want a cheap models, the open source ones are much better

▲

AussieWog93 14 hours ago | parent | prev [-]

I'd agree with you on 4.5 to 4.6, but going from gpt-5 or 4.0 to 4.5 was night and day.

	▲	butILoveLife 6 hours ago \| parent \| next [-]
		GPT5 added the router, which was def a downgrade. 4.5 was probably the best non-COT model humanity has made. But too expensive to run.
	▲	NewLogic 14 hours ago \| parent \| prev [-]
		Because post 4.0 dropped the sycophancy?