Remix.run Logo
seperman 3 days ago

Very interesting. Why does Claude find more problems if we mention the code is written by another developer?

mcintyre1994 3 days ago | parent | next [-]

Total guess, but maybe it breaks it out of the sycophancy that most models seem to exhibit?

I wonder if they’d also be better at things like telling you an idea is dumb if you tell it it’s from someone else and you’re just assessing it.

bgilly 3 days ago | parent | prev | next [-]

In my experience, Claude will criticize others more than it will criticize itself. Seems similar to how LLMs in general tend to say yes to things or call anything a good idea by default.

I find it to be an entertaining reflection of the cultural nuances embedded into training data and reinforcement learning processes.

umbra07 3 days ago | parent [-]

Interesting. In my experience, it's the opposite. Claude is too syncophantic. If you tell it that it was wrong, it will just accept your word at face value. If I give a problem to both Claude and Gemini, their responses differ and I ask Claude why Gemini has a different response - Claude will just roll over and tell me that Gemini's response was perfect and that it messed up.

This is why I was really taken by Gemini 2.0/2.5 when it first came out - it was the first model that really pushed back at you. It would even tell me that it wanted x additional information to continue onwards, unprompted. Sadly, as Google has neutered 2.5 over the last few months, its independent streak has also gone away, and its only slightly more individualistic than Claude/OpenAI's models.

daveydave 3 days ago | parent | prev | next [-]

I would guess the training data (conversational as opposed to coding specific solutions) is weighted towards people finding errors in others work, more than people discussing errors in their own. If you knew there was an error in your thinking, you probably wouldn't think that way.

gdudeman 3 days ago | parent | prev [-]

Claude is very agreeable and is an eager helper.

It gives you the benefit of the doubt if you're coding.

It also gives you the benefit of the doubt if you're looking for feedback on your developers work. If you give it a hint of distrust "my developer says they completed this, can you check and make sure, give them feedback....?" Claude will look out for you.