▲ | wizzwizz4 3 days ago | |
The architecture of the model does place limits on how much computation can be performed per token generated, though. Combined with the window size, that's a hard bound on computational complexity that's significantly lower than a Turing machine – unless you do something clever with the program that drives the model. | ||
▲ | vidarh 2 days ago | parent [-] | |
Hence the requirement for using the context for IO. A Turing machine requires two memory "slots" (the position of the read head, and the current state) + IO and a loop. That doesn't require much cleverness at all. |