Remix.run Logo
swyx 3 hours ago

(editor here) yes, a central nuance i try to communicate is not that LLMs cannot have world models (and in fact they've improved a lot) - it is just that they are doing this so inefficiently as to be impractical for scaling - we'd have to scale them up to so many more trillions of parameters more whereas our human brains are capable of very good multiplayer adversarial world models on 20W of power and 100T neurons.

naasking an hour ago | parent [-]

I agree LLMs are inefficient, but I don't think they are as inefficient as you imply. Human brains use a lot less power sure, but they're also a lot slower and worse at parallelism. An LLM can write an essay in a few minutes that would take a human days. If you aggregate all the power used by the human you're looking at kWh, much higher than the LLM used (an order of magnitude higher or more). And this doesn't even consider batch parallelism, which can further reduce power use per request.

But I do think that there is further underlying structure that can be exploited. A lot of recent work on geometric and latent interpretations of reasoning, geometric approaches to accelerate grokking, and as linear replacements for attention are promising directions, and multimodal training will further improve semantic synthesis.