▲ | bubblyworld 3 days ago | |||||||
> CoT models can, in principle, solve _any_ complex task. The authors explicitly discuss the expressive power of transformers and CoT in the introduction. They can only solve problems in a fairly restrictive complexity class (lower than PTIME!) - it's one of the theoretical motivations for the new architecture. "The fixed depth of standard Transformers places them in computational complexity classes such as AC0 [...]" This architecture by contrast is recurrent with inference time controlled by the model itself (there's a small Q-learning based subnetwork that decides halting time as it "thinks"), so there's no such limitation. The main meat of the paper is describing how to train this architecture efficiently, as that has historically been the issue with recurrent nets. | ||||||||
▲ | malcontented 3 days ago | parent [-] | |||||||
Agreed, regarding the computational simplicity of CoT LLMs, and that this solution certainly has much more flexibility. But is there a reason to believe that this architecture (and training method) is as applicable to the development of generally-capable models as it is to the solution of individual puzzles? Don't get me wrong, this is a cool development, and I would love to see how this architecture behaves on a constraint-based problem that's not easily tractable via traditional algorithm. | ||||||||
|