| ▲ | eranation 6 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ask them to tell the LLM it's wrong... then when it goes "You are absolutely right!" to challenge it and say that it was a test. Then when it replies, ask it if it's 100% sure. They'll lose faith pretty quick. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ericpauley 5 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is an oft-repeated meme, but I’m convinced the people saying it are either blindly repeating it, using bad models/system prompts, or some other issue. Claude Opus will absolutely push back if you disagree. I routinely push back on Claude only to discover on further evaluation that the model was correct. As a test I just did exactly what you said in a Claude Opus 4.6 session about another HN thread. Claude considered* the contradiction, evaluated additional sources, and responded backing up its original claim with more evidence. I will add that I use a system prompt that explicitly discourages sycophancy, but this is a single sentence expression of preference and not an indication of fundamental model weakness. * I’ll leave the anthropomorphism discussions to Searle; empirically this is the observed output. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | beeflet 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I tried to fool claude sonnet with confidence and it failed. https://claude.ai/share/47145af0-47d1-451b-813c-131ec48e7215 Maybe it is possible with a more complex or subjective question. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||