▲ | Did Claude's quality drop recently? | |||||||||||||||||||||||||
27 points by MeetingsBrowser a day ago | 16 comments | ||||||||||||||||||||||||||
At least subjectively, Claude has been outperforming other LLMs for me for a while. They seemed to be having significant availability issues this week. As of this morning things are working for me but the quality of the responses seems to be much worse. I frequently ask for lists of ideas and then drill down in follow up prompts. Today Claude keeps responding to my request for bulleted lists with something like, “I should not use lists unless explicitly requested and should instead write in paragraphs”, and then responds in paragraphs. Is anyone else having a similar experience? | ||||||||||||||||||||||||||
▲ | devonsolomon an hour ago | parent | next [-] | |||||||||||||||||||||||||
Addressed on Lex Fridman with Dario Amodei (CEO) and Amanda Eskell of Claude, they both insist the answer is no. I interpret their explanation for “no” as follows: these are probabilistic outputs, and so given any changes, for some inputs, some outputs will be worse some of the time. The argument goes that, given they’re probabilistic, even without changes, for some inputs, some outputs will be worse than the last time you gave it that input, some of the time. To be fair to them, it makes sense that any change would then be met with some vocal users who are genuinely experiencing worse output, but are not generally using a worse product. | ||||||||||||||||||||||||||
▲ | wqaatwt 20 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Yes. It seems to be almost incapable of communicating in anything but terse (up to 5 word or so) bullet points. And even when you force it write in coherent sentences the output still seems markedly worse than it used to. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | muzani 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Sounds like it might be switching to Claude Haiku instead of Claude Sonnet. Sonnet 3.5 always has this issue for me though. It excessively follows the original instructions, even in vague ways. It's likely 3.5 (new) is even worse. We use 3.0 in production because of this one quirk. | ||||||||||||||||||||||||||
▲ | philshem a day ago | parent | prev | next [-] | |||||||||||||||||||||||||
Yes, there are some reports. For example: https://news.ycombinator.com/item?id=42215912 | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | patrickhogan1 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Well, first off there is no such thing as Claude as there are multiple models that you can select from. You did not list which model you were using. In my opinion the Claude 3.5 Sonnet model is spectacular. It’s the best model yet for coding both on leaderboard and empirically in projects I’ve had it help me with. This topic is discussed in recent Lex Fridman interview with with CEO of Anthropic where he very clearly walks through how these claims of it being dumber or not true. It’s a great interview and after listening to it I’m even more bullish on Anthropic. There was a small degradation in performance that they posted an alert at the top of the page 2 nights ago. It didn’t affect the quality of the responses I got but it didn’t cause somewhat of a slowdown in response speed. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | sk11001 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
I think so, it was bad enough for me to cancel my subscription. | ||||||||||||||||||||||||||
▲ | ldjkfkdsjnv 17 hours ago | parent | prev [-] | |||||||||||||||||||||||||
The new model is almost certainly a cheaper version of the older model, where they tried to maintain quality. | ||||||||||||||||||||||||||
|