| ▲ | hackinthebochs 6 hours ago | |||||||
LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video. The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits. | ||||||||
| ▲ | otabdeveloper4 5 hours ago | parent [-] | |||||||
> LLMs are a general purpose computing paradigm. Yes, so is logistic regression. | ||||||||
| ||||||||