▲ | lucb1e 3 days ago | |
Then I can't explain why it's producing the results that it does. If you have more information to share, I'm happy to update my knowledge... Doing a web search on the topic just comes up with marketing materials. Even Wikipedia's "Reasoning language model" article is mostly a list of release dates and model names, with as only relevant-sounding remark as to how these models are different: "[LLMs] can be fine-tuned on a dataset of reasoning tasks paired with example solutions and step-by-step (reasoning) traces. The fine-tuned model can then produce its own reasoning traces for new problems." It sounds like just another dataset: more examples, more training, in particular on worked examples where this "think step by step" method is being demonstrated with known-good steps and values. I don't see how that fundamentally changes how it works; you're saying such models do not predict the most likely token for a given context anymore, that there is some fundamentally different reasoning process going on somewhere? | ||
▲ | svnt 15 hours ago | parent [-] | |
I'm saying adding "think step by step" does not get you close to actual reasoning, it just produces marginally self-consistent linguistic reasoning. Actual reasoning requires training on diverse data sources, as you noted, but also coached experimentation (supervised fine-tuning) not just adding "think step by step" instruction to a model trained on typical textual datasets. "Think step by step" came first and produced increased performance on a variety of tasks, but was overhyped in its approximation of reasoning. |