▲ | malcontented 3 days ago | ||||||||||||||||||||||
I appreciate the connections with neurology, and the paper itself doesn't ring any alarm bells. I don't think I'd reject it if it fell to me to peer review. However, I have extreme skepticism when it comes to the applicability of this finding. Based on what they have written, they seem to have created a universal (maybe; adaptable at the very least) constraint-satisfaction solver that learns the rules of the constraint-satisfaction problem from a small number of examples. If true (I have not yet had the leisure to replicate their examples and try them on something else), this is pretty cool, but I do not understand the comparison with CoT models. CoT models can, in principle, solve _any_ complex task. This needs to be trained to a specific puzzle which it can then solve: it makes no pretense to universality. It isn't even clear that it is meant to be capable of adapting to any given puzzle. I suspect this is not the case, just based on what I have read in the paper and on the indicative choice of examples they tested it against. This is kind of like claiming that Stockfish is way smarter than current state of the art LLMs because it can beat the stuffing out of them in chess. I feel the authors have a good idea here, but that they have marketed it a bit too... generously. | |||||||||||||||||||||||
▲ | jurgenaut23 3 days ago | parent | next [-] | ||||||||||||||||||||||
Yes, I agree, but this is a huge deal in and of itself. I suppose the authors had to frame it in this way for obvious reasons of hype surfing, but this is an amazing achievement, especially given the small size of the model! I’d rather use a customized model for a specific problem than a supposedly « generally intelligent » model that burns orders of magnitude more energy for much less reliability. | |||||||||||||||||||||||
▲ | JBits 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
> CoT models can, in principle, solve _any_ complex task. What is the justification for this? Is there a mathematical proof? To me, CoT seems like a hack to work around the severe limitations of current LLMs. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | bubblyworld 3 days ago | parent | prev [-] | ||||||||||||||||||||||
> CoT models can, in principle, solve _any_ complex task. The authors explicitly discuss the expressive power of transformers and CoT in the introduction. They can only solve problems in a fairly restrictive complexity class (lower than PTIME!) - it's one of the theoretical motivations for the new architecture. "The fixed depth of standard Transformers places them in computational complexity classes such as AC0 [...]" This architecture by contrast is recurrent with inference time controlled by the model itself (there's a small Q-learning based subnetwork that decides halting time as it "thinks"), so there's no such limitation. The main meat of the paper is describing how to train this architecture efficiently, as that has historically been the issue with recurrent nets. | |||||||||||||||||||||||
|