Remix.run Logo
epolanski 9 hours ago

I've been saying for ages that since Opus 4.6 models are increasingly smarter but further unhelpful as assistants.

Fable was amazing as a vibecoder but as an assistant it can't resist jumping into implementation and filling chats of pointless jargon.

It's really grim if you're looking for assistance instead of an implementor.

GPT 5.5 Pro and Fable are gorgeous bullshitters that pretend to be right (often convincingly because they are very smart) even when they are wrong and I need tons of energy to process their information.

I don't like it but don't know what to do, Anthropic models especially increasingly ignore instructions whether in memory or agents files.

thewebguyd 9 hours ago | parent | next [-]

By design, unfortunately. If they are just assistants, they can't sell the dream of "we're going to replace human labor completely" to the C-suite.

baq 9 hours ago | parent | next [-]

It isn’t a dream, it’s a reality for some of us here and it will be increasingly so for everyone else. Amazingly, USG intervening slowed the dynamic greatly (fortunately?)

The problem is obviously who will be left. There’s a lot of scifi to catch up on.

epolanski 9 hours ago | parent | prev [-]

I think that they are simply evaluated on prompt to solution benchmarks.

whstl 8 hours ago | parent | prev | next [-]

Yep, this is why experiences and ratings of models vary so wildly.

I recently migrated a very large web app to Tailwind and Opus kept screwing up over and over, refactoring and changing the design, the more complex the component became.

I ended up asking Haiku to do it and it managed to do everything correctly, pretty much without intervention.

mullingitover 8 hours ago | parent | prev | next [-]

> I don't like it but don't know what to do, Anthropic models especially increasingly ignore instructions whether in memory or agents files.

I've taken to instructing the agent to manage the subagent, and the principal agent's sole job is to ensuring the subagent follows instructions to the letter.

epolanski 6 hours ago | parent | prev [-]

Just to follow up on what I mean, this was my first interaction with Sonnet 5:

"I just cloned this repo, investigate how to set it up, don't install anything, just collect information"

_spews information_

I proceed with the setup, but get a Linux specific dependency in a bash script, so I want to evaluate whether it can be rewritten...

"There's this error on MacOS, I think it's because we need linux-utils from brew, verify whether the script can be written in bare posix"

_proceeds installing linux-utils and all the rest_

"Didn't I tell you to not install anything?"

_you're absolutely right_

F*k me..