Remix clone Hacker News

Could the exclusion of CoT that be because of this recent Anthropic paper?

https://assets.anthropic.com/m/71876fabef0f0ed4/original/rea...

>We evaluate CoT faithfulness of state-of-the-art reasoning models across 6 reasoning hints presented in the prompts and find: (1) for most settings and models tested, CoTs reveal their usage of hints in at least 1% of examples where they use the hint, but the reveal rate is often below 20%, (2) outcome-based reinforcement learning initially improves faithfulness but plateaus without saturating, and (3) when reinforcement learning increases how frequently hints are used (reward hacking), the propensity to verbalize them does not increase, even without training against a CoT monitor. These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.

I.e., chain of thought may be a confabulation by the model, too. So perhaps there's somebody at Anthropic who doesn't want to mislead their customers. Perhaps they'll come back once this problem is solved.

▲

whimsicalism 17 hours ago | parent | next [-]

i think it is almost certainly to prevent distillation

▲

andrepd 13 hours ago | parent | prev [-]

I have no idea what this means, can someone give the eli5?

▲

a_bonobo 10 hours ago | parent | next [-]

Anthropic has a nice press release that summarises it in simpler terms: https://www.anthropic.com/research/reasoning-models-dont-say...

▲

meesles 12 hours ago | parent | prev | next [-]

Ask an LLM!

▲

otabdeveloper4 9 hours ago | parent | prev [-]

I don't either, but chain of thought is obviously bullshit and just more LLM hallucination.

LLMs will routinely "reason" through a solution and then proceed to give out a final answer that is completely unrelated to the preceding "reasoning".

	▲	aqfamnzc 7 hours ago \| parent [-]
		It's more hallucination in the sense that all LLM output is hallucination. CoT is not "what the llm is thinking". I think of it as just creating more context/prompt for itself on the fly, so that when it comes up with a final response it has all that reasoning in its context window.