▲ | energy123 7 days ago | ||||||||||||||||
Performance is proportional to the number of reasoning tokens. How to reconcile that with your opinion that they are "random words"? | |||||||||||||||||
▲ | kelipso 6 days ago | parent | next [-] | ||||||||||||||||
Technically random can have probabilities associated with them.. Casual speech, random means equal probabilities, or we don’t know the probabilities. But for LLM token output, it does estimate the probabilities. | |||||||||||||||||
| |||||||||||||||||
▲ | blargey 6 days ago | parent | prev [-] | ||||||||||||||||
s/random/statistically-likely/g Reducing the distance of each statistical leap improves “performance” since you would avoid failure modes that are specific to the largest statistical leaps, but it doesn’t change the underlying mechanism. Reasoning models still “hallucinate” spectacularly even with “shorter” gaps. | |||||||||||||||||
|