Remix.run Logo
CamperBob2 2 hours ago

The 'hot air' is apparently more important than it appears at first, because those initial tokens are the substrate that the transformer uses for computation. Karpathy talks a little about this in some of his introductory lectures on YouTube.

Terr_ 2 hours ago | parent [-]

Related are "reasoning" models, where there's a stream of "hot air" that's not being shown to the end-user.

I analogize it as a film noir script document: The hardboiled detective character has unspoken text, and if you ask some agent to "make this document longer", there's extra continuity to work with.