Remix.run Logo
pqdbr 5 days ago

Same here. Even with Opus in Claude Code I'm getting terrible results, sometimes feeling we went back to the GPT 3.5 eon. And it seems they are implementing heavily token-saving measures: the model does not read context anymore unless you force it to, making up method calls as it goes.

mh- 5 days ago | parent | next [-]

The simplest thing I frequently ask of regular Claude (not Code) in the desktop app:

"Use your web search tool to find me the go-to component for doing xyz in $language $framework. Always link the GitHub repo in your response."

Previously Sonnet 4 would return a good answer to this at least 80% of the time.

Now even Opus 4.1 with extended thinking frequently ignores my ask for it to use the search tool, which allows it to hallucinate a component in a library. Or maybe an entire repo.

It's gone backwards severely.

(If someone from Anthropic sees this, feel free to reach out for chat IDs/share links. I have dozens.)

spicybright 5 days ago | parent | next [-]

Glad I'm not crazy. I actually noticed both 4 models are just garbage. I started running my prompts through those, and Sonnet 3.7 comparing the results. Sonnet 3.7 is way better at everything.

idonotknowwhy 4 days ago | parent [-]

You're not crazy, and this isn't new for Anthropic. Something is off with Opus4.1, I actually saw it make 2 "typos" last week (I've never seen a model like this make a dumb "typo" before). And it's missing details that it understood last month (can easily test this if you have some chats in OpenWebUI or LibreChat, just go in and hit regenerate).

Sonnet 3.5 did this last year a few times, it'd have days where it wasn't working properly, and sure enough, I'd jump online and see "Claude's been lobotomized again".

They also experiment with injecting hidden system prompts from time to time. Eg. if you ask for a story about some IP, it'll interrupt your prompt and remind the model not to infringe copyright. (We could see this via API with prompt engineering, adding a "!repeat" "debug prompt" that revealed it, though they seem to have patched that now.

> I started running my prompts through those, and Sonnet 3.7 comparing the results. Sonnet 3.7 is way better at everything.

Same here. And on API, the old Opus 3 is also unaffected (though that model is too old for coding).

dingnuts 5 days ago | parent | prev [-]

How is this better/faster than typing "xyz language framework site://github.com" into Kagi

IDK about you but I find it faster to type a few keywords and click the first result than to wait for "extended thinking" to warm up a cup of hot water only to ignore "your ask" (it's a "request," not an "ask," unless you're talking to a Product Manager with corporate brain damage) to search and then outputs bullshit.

I can only assume after you waste $0.10 asking Claude and reading the bullshit, you use normal search.

Truly revolutionary rechnology

j45 5 days ago | parent | prev [-]

I’m running into this as well.

Might be Claude optimizing for general use cases compared to code and that affecting the code side?

Feels strange, because Claude api isn’t the same as the web tool so I didn’t expect Claude code to be the same.

It might be a case of having to learn to read Claude best practice docs and keep up with them. Normally I’d have Claude read them itself and update an approach to use. Not sure that works as well anymore.