▲ | vidarh 16 hours ago | |
> I don't think you understood what I was writing. I wasn't saying that either the LLM (finished product OR the machines used for training them) were not Turing Complete. I said it was irrelevant. Why do you think it is irrelevant? It is what allows us to say with near certainty that dismissing the potential of LLMs to be made to reason is unscientific and irrational. > I'm not talking about shortcuts. When I talk about sacrificing, I'm talking about algorithms that you can run on any Turing Complete machine that are (to our knowledge) fundamentally impossible to distribute properly, regardless of shortcuts. But again, we've not sacrificed the ability to run those. > Which, to the degree that it's true, is irrelevant for the reason that I'm saying Turing Completeness is a distraction. You're not likely to run algorithms that require 10^20 to 10^25 steps within the context of an LLM. Maybe or maybe not, because today inference is expensive, but we already are running plenty of algorithms that require many runs, and steadily increasing as inference speed relative to network size is improving. > On the other hand, if you make a cluster to train LLM's that is explicitly NOT Turing Complete (it can be designed to refuse to run code that is not fully parallel to avoid costs in the millions just to have a single cuda run activated, for instance), it can still be just as good at it's dedicated task (training LLM)s. And? The specific code used to run training has no relevance to the limitations of the model architecture. | ||
▲ | trashtester 15 hours ago | parent [-] | |
See my other response above, I think I've identified what part of my argument was unclear. The update may still have claims in it that you disagree with, but those are specific and (at some point in the future) probably testable. |