Remix.run Logo
Majromax 2 hours ago

Thinking happens in latent space, but the thinking trace is then the projection of that thinking onto tokens. Since autoregressive generation involves sampling a specific token and continuing the process, that sampling step is lossy.

However, it is a genuine question whether the literal meanings of thinking blocks are important over their less-observable latent meanings. The ultimate latent state attributable to the last-generated thinking token is some combination of the actual token (literal meaning) and recurrent thinking thus far. The latter does have some value; a 2024 paper (https://arxiv.org/abs/2404.15758) noted that simply adding dots to the output allowed some models to perform more latent computation resulting in higher-skill answers. However, since this is not a routine practice today I suspect that genuine "thinking" steps have higher value.

Ultimately, your thesis can be tested. Take the output of a reasoning model inclusive of thinking tokens, then re-generate answers with:

1. Different but semantically similar thinking steps (i.e. synonyms, summarization). That will test whether the model is encoding detailed information inside token latent space.

2. Meaningless thinking steps (dots or word salad), testing whether the model is performing detailed but latent computation, effectively ignoring the semantic context of

3. A semantically meaningful distraction (e.g. a thinking trace from a different question)

Look for where performance drops off the most. If between 0 (control) and 1, then the thinking step is really just a trace of some latent magic spell, so it's not meaningful. If between 1 and 2, then thinking traces serve a role approximately like a human's verbalized train of thought. If between 2 and 3 then the role is mixed, leading back to the 'magic spell' theory but without the 'verbal' component being important.