Remix.run Logo
CamperBob2 2 hours ago

If you ask a model to multiply 322423324 by 8675309232 without using tools, it's interesting to think about how it does it. Where are the intermediate results being maintained?

"In context" is the obvious answer... but if you view the chain of thought from a reasoning model, it may have little or nothing to do with arriving at the correct answer. It may even be complete nonsense. The model is working with tokens in context, but internally the transformer is maintaining some state with those tokens that seems to be independent of the superficial meanings of the tokens. That is profoundly weird, and to me, it makes it difficult to draw a line in the sand between what LLMs can do and what human brains can do.

2 hours ago | parent [-]
[deleted]