Remix.run Logo
ForHackernews 6 days ago

Didn't Anthropic show that LLMs frequently hallucinate their "reasoning" steps?

> Bullshitting (Unfaithful): The model gives the wrong answer. The computation we can see looks like it’s just guessing the answer, despite the chain of thought suggesting it’s computed it using a calculator.

https://transformer-circuits.pub/2025/attribution-graphs/bio...