▲ | vidarh 2 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Not all algorithms can be distributed effectively across multiple threads. Sure, but talking about Turing completeness is not about efficiency, but about the computational ability of a system. > THIS is the flaw of the Turing Completness idea. Algorithms that REQUIRE the full feature set of Turing Completeness are in some cases extremely slow. The "feature set" of Turing Completeness can be reduced to a loop, an array lookup, and an IO port. It's not about whether the algorithms require a Turing complete system, but that Turing completeness proves the equivalence of the upper limit of which set of functions the architecture can compute, and that pretty much any meaningful architecture you will come up with is still Turing complete. > At this point, you've left the Turing Completness paradigm fully behind. Though the real shift happened when you removed those requirements from your algorithm, not when you shifted the hardware. If a system can take 3 bits of input and use it to look up 5 bits of output in a table of 30 bits of data, and it is driven by a driver that uses 1 bit of the input as the current/new state, 1 bit for the direction to move the tape, and 3 bits for the symbol to read/write, and that driver processes the left/right/read/write tape operations and loops back, you have Turing complete system (Wolfram's 2-state 3-symbol Turing machine). So no, you have not left Turing completeness behind, as any function that can map 3 bits of input to 5 bits of output becomes a Turing complete system if you can put a loop and IO mechanism around it. Again, the point is not that this is a good way of doing something, but that it serves as a way to point out that what it takes to make an LLM Turing complete is so utterly trivial. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | trashtester 2 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sure, but talking about Turing completeness is not about efficiency, but about the computational ability of a system. I know. That is part of my claim that this "talking about Turing completeness" is a distraction. Specifically because it ignores efficiency/speed. > Again, the point is not that this is a good way of doing something, but that it serves as a way to point out that what it takes to make an LLM Turing complete is so utterly trivial. And again, I KNOW that it's utterly trivial to create a Turing Complete system. I ALSO know that a Turing Complete system can perform ANY computation (it pretty much defines what a computation is), given enough time and memory/storage. But if such a calculation takes 10^6 times longer than necessary, it's also utterly useless to approach it in this way. Specifically, the problem with Turing Completeness is that it implies the ability to create global branches/dependencies in the code based on the output of any previous computation step. > The "feature set" of Turing Completeness can be reduced to a loop, an array lookup, and an IO port. This model is intrinsically single threaded, so the global branches/dependencies requirement is trivially satisifed. Generally, though, if you want to be able to distribute a computation, you have to pretty much never allow the results of a computation of any arbitrary compute thread to affect the next computation on any of the other threads. NOBODY would be able to train LLM's that are anything like the ones we see today, if they were not willing to make that sacrifice. Also, downstream from this is the hardware optimizations that are needed to even run these algorithms. While you _could_ train any of the current LLM's on large CPU clusters, a direct port would require perhaps 1000x more hardware, electricity, etc than running it on GPU's or TPU's. Not only that, but if the networks being trained (+ some amount of training data) couldn't fit into the fast GPU/TPU memory during training, but instead had to be swapped in and out of system memory or disk, then that would also cause orders of magnitude of slowdown, even if using GPU/TPU's for the training. In other words, what we're seeing is a trend towards ever increasing coupling between algorithms being run and the hardware they run on. When I say that thinking in terms of Turing Completeness is a distraction, it doesn't mean it's wrong. It's just irrelevant. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|