▲ | vidarh 3 days ago | |||||||
That's fine. I've made no claim about any given training process. I've addressed the annoying repetitive dismissal via the "but they're next token predictors" argument. The point is that being next token predictors does not limit their theoretical limits, so it's a meaningless argument. | ||||||||
▲ | wizzwizz4 3 days ago | parent [-] | |||||||
The architecture of the model does place limits on how much computation can be performed per token generated, though. Combined with the window size, that's a hard bound on computational complexity that's significantly lower than a Turing machine – unless you do something clever with the program that drives the model. | ||||||||
|