Look at any recent CoT output where the model is trying to infer from an underspecified prompt what the user wants or means.

It is generally the first thing they do — try to figure out what did you mean with this prompt. When they can’t infer your intent, good models ask follow-on questions to clarify.

I am wondering if this is a semantics issue as this is an established are of research, eg https://arxiv.org/pdf/2501.10871

▲

batshit_beaver 2 hours ago | parent [-]

Right, and then look at any number of research papers showing that CoT output has limited impact on the end result. We've trained these models to pretend to reason.

	▲	atleastoptimal 25 minutes ago \| parent [-]
		If it's only pretending to reason, then how is it that the CoT output improves performance on every single benchmark/test?