▲ | xg15 a day ago | |||||||||||||||||||||||||||||||||||||||||||
> There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; Isn't the whole reason for chain-of-thought that the tokens sort of are the reasoning process? Yes, there is more internal state in the model's hidden layers while it predicts the next token - but that information is gone at the end of that prediction pass. The information that is kept "between one token and the next" is really only the tokens themselves, right? So in that sense, the OP would be wrong. Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean. | ||||||||||||||||||||||||||||||||||||||||||||
▲ | miven a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
I'm not sure I understand what you're trying to say here, information between tokens is propagated through self-attention, and there's an attention block inside each transformer block within the model, that's a whole lot of internal state that's stored in (mostly) inscrutable key and value vectors with hundreds of dimensions per attention head, around a few dozen heads per attention block, and around a few dozen blocks per model. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
▲ | the_mitsuhiko 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
> Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean. What I think is interesting about this is that for the most part reading the reasoning output is something we can understand. The tokens as produced form english sentences, make intuitive sense. If we think of the reasoning output block as basically just "hidden state" then one could imagine that a there might be a more efficient representation that trades human understanding for just priming the internal state of the model. In some abstract sense you can already get that by asking the model to operate in different languages. My first experience with reasoning models where you could see the output of the thinking block I think was QwQ which just reasoned in Chinese most of the time, even if the final output was German. Deepseek will sometimes keep reasoning in English even if you ask it German stuff, sometimes it does reason in German. All in all, there might be a more efficient representation of the internal state if one forgoes human readable output. | ||||||||||||||||||||||||||||||||||||||||||||
▲ | svachalek a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Exactly. There's no state outside the context. The difference in performance between the non-reasoning model and the reasoning model comes from the extra tokens in the context. The relationship isn't strictly a logical one, just as it isn't for non-reasoning LLMs, but the process is autoregression and happens in plain sight. | ||||||||||||||||||||||||||||||||||||||||||||
▲ | comex a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
> Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean. But it's probably not that mysterious either. Or at least, this test doesn't show it to be so. For example, I doubt that the chain of thought in these examples secretly encodes "I'm going to cheat". It's more that the chain of thought is irrelevant. The model thinks it already knows the correct answer just by looking at the question, so the task shifts to coming up with the best excuse it can think of to reach that answer. But that doesn't say much, one way or the other, about how the model treats the chain of thought when it legitimately is relying on it. It's like a young human taking a math test where you're told to "show your work". What I remember from high school is that the "work" you're supposed to show has strict formatting requirements, and may require you to use a specific method. Often there are other, easier methods to find the correct answer: for example, visual estimation in a geometry problem, or just using a different algorithm. So in practice you often figure out the answer first and then come up with the justification. As a result, your "work" becomes pretty disconnected from the final answer. If you don't understand the intended method, the "work" might end up being pretty BS while mysteriously still leading to the correct answer. But that only applies if you know an easier method! If you don't, then the work you show will be, essentially, your actual reasoning process. At most you might neglect to write down auxiliary factors that hint towards or away from a specific answer. If some number seems too large, or too difficult to compute for a test meant to be taken by hand, then you might think you've made a mistake; if an equation turns out to unexpectedly simplify, then you might think you're onto something. You're not supposed to write down that kind of intuition, only concrete algorithmic steps. But the concrete steps are still fundamentally an accurate representation of your thought process. (Incidentally, if you literally tell a CoT model to solve a math problem, it is allowed to write down those types of auxiliary factors, and probably will. But I'm treating this more as an analogy for CoT in general.) Also, a model has a harder time hiding its work than a human taking a math test. In a math test you can write down calculations that don't end up being part of the final shown work. A model can't, so any hidden computations are limited to the ones it can do "in its head". Though admittedly those are very different from what a human can do in their head. |