▲ | trashtester 2 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> My point is that this isn't true. Every computer you've ever used is a Turing complete system, and there is no need for such a system to be single-threaded, as a multi-threaded system can simulate a single-threaded system and vice versa. Not all algorithms can be distributed effectively across multiple threads. A computer can have 1000 cpu cores, and only be able to use a single one when running such algorithms. Some other algorithms may be distributed through branch predictions, by trying to run future processing steps ahead of time for each possible outcome of the current processing step. In fact, modern CPU's already do this a lot to speed up processing. But even branch prediction hits something like a logarithmic wall of diminishing returns. While you are right that multi core CPU's (or whole data centers) can run such algorithms, that doesn't mean they can run them quickly, hence my claim: >> My point is that any such system is extremely limited due to how slow Algorithms that can only utilize a single core seem to be stuck at the GFLOPS scale, regardless of what hardware they run on. Even if only a small percentage (like 5%) of the code in a program is inherently limited to being single threaded (At best, you will achieve TFlops numbers), this imposes a fatal limitation on computational problems that require very large amounts of computing power. (For instance at the ExaFlop scale or higher.) THIS is the flaw of the Turing Completness idea. Algorithms that REQUIRE the full feature set of Turing Completeness are in some cases extremely slow. So if you want to do calculations that require, say, 1 ExaFlop (about the raw compute of the human brain) to be fast enough for a given purpose, you need to make almost all compute steps fully parallelizable. Now that you've restricted your algorithm to no longer require all features of a Turing Complete system, you CAN still run it on Turing Complete CPU's. You're just not MAKING USE of their Turing Completeness. That's just very expensive. At this point, you may as well build dedicated hardware that do not have all the optimizations that CPU have for single threaded computation, like GPU's or TPU's, and lower your compute cost by 10x, 100x or more (which could be the difference between $500 billion and $billion). At this point, you've left the Turing Completness paradigm fully behind. Though the real shift happened when you removed those requirements from your algorithm, not when you shifted the hardware. One way to describe this, is that from the space of all possible algorithms that can run on a Turing Complete system, you've selected a small sub-space of algorithms that can be parallelized. By doing this trade, you've severely limited what algorithms you can run, in exchange for the ability to get a speedup of 6 orders of magnitude, or more in many cases. And in order to keep this speedup, you also have to accept other hardware based limitations, such as staying within the amount of GPU memory available, etc. Sure, you COULD train GPT-5 or Grok-3 on a C-64 with infinite casette tape storage, but it would take virtually forever for the training to finish. So that fact has no practical utility. I DO realize that the concept of the equivalence of all Turing Complete systems is very beautiful. But this CAN be a distraction, and lead to intuitions that seem to me to be completely wrong. Like Deutsch's idea that the ideas in a human brain are fundamentally linked to the brain's Turing Completeness. While in reality, it takes years of practice for a child to learn how to be Turing Complete, and even then the child's brain will struggle to do a floating point calculation every 5 minutes. Meanwhile, joint systems of algorithms and the hardware they run on can do very impressive calculations when ditching some of the requirements of Turing Completeness. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | vidarh 2 days ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Not all algorithms can be distributed effectively across multiple threads. Sure, but talking about Turing completeness is not about efficiency, but about the computational ability of a system. > THIS is the flaw of the Turing Completness idea. Algorithms that REQUIRE the full feature set of Turing Completeness are in some cases extremely slow. The "feature set" of Turing Completeness can be reduced to a loop, an array lookup, and an IO port. It's not about whether the algorithms require a Turing complete system, but that Turing completeness proves the equivalence of the upper limit of which set of functions the architecture can compute, and that pretty much any meaningful architecture you will come up with is still Turing complete. > At this point, you've left the Turing Completness paradigm fully behind. Though the real shift happened when you removed those requirements from your algorithm, not when you shifted the hardware. If a system can take 3 bits of input and use it to look up 5 bits of output in a table of 30 bits of data, and it is driven by a driver that uses 1 bit of the input as the current/new state, 1 bit for the direction to move the tape, and 3 bits for the symbol to read/write, and that driver processes the left/right/read/write tape operations and loops back, you have Turing complete system (Wolfram's 2-state 3-symbol Turing machine). So no, you have not left Turing completeness behind, as any function that can map 3 bits of input to 5 bits of output becomes a Turing complete system if you can put a loop and IO mechanism around it. Again, the point is not that this is a good way of doing something, but that it serves as a way to point out that what it takes to make an LLM Turing complete is so utterly trivial. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|