Totally agreed. I sometimes wonder if they are making the model "lazy" with each iteration, it keeps getting better at avoiding work.

▲

skerit 9 hours ago | parent [-]

This is why Fable was so good. It followed instructions and it was in no way lazy.

▲

DontchaKnowit 9 hours ago | parent | next [-]

People keep making comments about fable like this? You could only use it for what like a week? How is that at all enough time to evaluate? Opus 4.6 didnt suffer from this problems for a hot minute and then when newer models were released it got worse. I think they change a ton behind the scenes and allocate compute however they want, so the model you use today may behave much differently than how it behaved yesterday

	▲	pdimitar 8 hours ago \| parent \| next [-]
		> You could only use it for what like a week? How is that at all enough time to evaluate? By observing how in 4 workdays it achieved more than Opus in ~11 days. I am my team's backend lead and the Fable 5 model finally turned the tide on my overwhelming backlog. Back to Opus and I have to treat it like special-education kid multiple times a day.
	▲	boc 8 hours ago \| parent \| prev \| next [-]
		The ~72 hours I had access to Fable were by far the most productive I've had in months. Re-wrote massive parts of my codebase and caught a ton of bugs and logic issues that had silently slipped through before. I went over my subscription limit and immediately kept paying the API price to keep going. It was that good.
	▲	plorkyeran 8 hours ago \| parent \| prev \| next [-]
		It was a pretty stark difference. I had the opposite problem where it did too much and overshot what I wanted from it so I certainly assume that if it had stuck around it would have gotten tuned back a bit pretty quickly.
	▲	marcindulak 6 hours ago \| parent \| prev \| next [-]
		For me claude-fable-5 failed to follow the instruction following test I'm making against various models https://github.com/marcindulak/claude-fails-to-follow-claude...
	▲	tskj 8 hours ago \| parent \| prev \| next [-]
		You didn't really have to use it more than a day honestly to tell what kind of shocking paradigm change it was. Man do I miss it.
	▲	Analemma_ 8 hours ago \| parent \| prev [-]
		Heh, it's not crazy if you're here in the Bay: I know multiple people who more-or-less disappeared for days when Fable came out because they were running their benchmarks, and only emerged blinking into the sunlight when the USG banned it. That's just how things are here now, most people are normal but there are some serious LLM dope addicts out and about.

▲

acters 9 hours ago | parent | prev [-]

I've been seeing LLMs act lazy from the very beginning. They got a little better but smaller models really only want to have a single task given to them. Mythos at least does work. RIP