Remix clone Hacker News

	▲	meroes a day ago
		Yep. Chain of thought is just more context disguised as "reasoning". I'm saying this as a RLHF'er going off purely what I see. Never would I say there is reasoning involved. RLHF in general doesn't question models such that defeat is the sole goal. Simulating expected prompts is the game most of the time. So it's just a massive blob of context. A motivated RLHF'er can defeat models all day. Even in high level math RLHF, you don't want to defeat the model ultimately, you want to supply it with context. Context, context, context. Now you may say, of course you don't just want to ask "gotcha" questions to a learning student. So it'd be unfair to the do that to LLMs. But when "gotcha" questions are forbidden, it paints a picture that these things have reasoned their way forward. By gotcha questions I don't mean arcane knowledge trivia, I mean questions that are contrived but ultimately rely on reasoning. Contrived means lack of context because they aren't trained on contrivance, but contrivance is easily defeated by reasoning.