| ▲ | vlovich123 an hour ago | |||||||||||||
You’re just complaining they can’t prove a negative, which is literally impossible. “I can accept fairies don’t exist today but that doesn’t mean fairies won’t exist in the future.” The burden of proof lies in those claiming the transformer is able to do something like this. In fact, given that our brains don’t have anything resembling transformers, they don’t learn anything like we train models, and they have all sorts of integrated memory mechanisms we simply do party tricks around with vector databases, I think it’s safer to err on the side of assuming existing transformers failing in very specific ways that human brains do not generally. Also, we clearly haven’t really seen major architectural changes for transformers for a few years now. Most of it has been RL gains, not structural improvements. So it stands to reason that the deficiencies will remain even if we figure out ways to paper over it on a case by case basis. | ||||||||||||||
| ▲ | quotemstr an hour ago | parent [-] | |||||||||||||
Yes, I am complaining that they are making an impossibility claim on the basis of an observational gap. Such claims don't have a great track record in the history of science. > negative, which is literally impossible. Impossibility proofs are common in mathematics, physics, and computer science. This paper is not one of them. It reports an observational gap. That's not the same thing at all as showing, e.g. that any transformer no matter how large or interconnected, can't compute some function. > our brains Airliners don't have feathers. > we clearly haven’t really seen major architectural changes for transformers for a few years now. Ever read a DeepSeek paper? Ever hear of MLA? Mamba? Or gated deltanet? Or RLMs? Universal transformers? There's been a deluge of architectural advancement over the past few years. You shouldn't go around asserting the burden of proof falls on this or that party if you're not familiar enough with the recent literature to recognize the kinds of proof that would satisfy this burden. > deficiencies will remain even if we figure out ways to paper over it on a case by case basis. I think there are general solutions unknown to us for classes of problem we solve one by one through brute force today. Not arguing that. I just don't accept that the path to generality goes through giving up "transformers", whatever this term means after the architectural Cambrian explosion of the past few years. It's much more likely that further capability unlocks involve in-the-weights continuous online learning. How we do that is orthogonal to whether the weights encode a transformer, a diffusion model, a SSM, or something more exotic. Sure, these things aren't pure transformers. But neither are frontier models. The industry is already doing what you suggest and moving beyond naive KQ dot product full depth everywhere 2010s-era transformers. Architectural innovation hasn't solved the problem. Turns out different architectures for approximating functions all form function approximators. The problem is in formulating the functions we want to approximate, not our spelling of the approximation engine! | ||||||||||||||
| ||||||||||||||