Remix.run Logo
svnt 3 days ago

You’re absolutely wrong! This is not how reasoning models work. Chain-of-thought did not produce reasoning models.

Dylan16807 2 days ago | parent | next [-]

How do they work then?

Because I thought chain of thought made for reasoning. And the first google result for 'chain of thought versus reasoning models' says it does: https://medium.com/@mayadakhatib/the-era-of-reasoning-models...

Give me a better source.

svnt 15 hours ago | parent [-]

Did you even read the article you posted? It supports my statement.

CoT produces the linguistic scaffolding for reasoning, but doesn't actually provide much accuracy in doing so.

e.g. https://developer.nvidia.com/blog/maximize-robotics-performa...

lucb1e 3 days ago | parent | prev [-]

Then I can't explain why it's producing the results that it does. If you have more information to share, I'm happy to update my knowledge...

Doing a web search on the topic just comes up with marketing materials. Even Wikipedia's "Reasoning language model" article is mostly a list of release dates and model names, with as only relevant-sounding remark as to how these models are different: "[LLMs] can be fine-tuned on a dataset of reasoning tasks paired with example solutions and step-by-step (reasoning) traces. The fine-tuned model can then produce its own reasoning traces for new problems." It sounds like just another dataset: more examples, more training, in particular on worked examples where this "think step by step" method is being demonstrated with known-good steps and values. I don't see how that fundamentally changes how it works; you're saying such models do not predict the most likely token for a given context anymore, that there is some fundamentally different reasoning process going on somewhere?

svnt 15 hours ago | parent [-]

I'm saying adding "think step by step" does not get you close to actual reasoning, it just produces marginally self-consistent linguistic reasoning.

Actual reasoning requires training on diverse data sources, as you noted, but also coached experimentation (supervised fine-tuning) not just adding "think step by step" instruction to a model trained on typical textual datasets. "Think step by step" came first and produced increased performance on a variety of tasks, but was overhyped in its approximation of reasoning.