Remix.run Logo
mcdeltat 3 hours ago

Ok I don't really care either way but to play devil's advocate, what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing? The pushpack from people, albeit a little aggressive, does have a grain of truth. We're demonstrating that a model which uses preexisting addition instructions can add numbers? I mean yeah you can do it with arbitrarily few parameters because you don't need a machine learning model at all. Not exactly groundbreaking so I reckon the debate is fair.

Now if you said this proof of addition opens up some other interesting avenue of research, sure.

Lerc 2 hours ago | parent [-]

>what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing?

Well for starters, it puts the lie to the argument that a transformer can only output examples it has seen before. Performing the calculation on examples that haven't been seen demonstrates generalisation of the principles and not regurgitation.

While this misconception persists in a large number of people, counterexamples can always serve a useful purpose.

mcdeltat 8 minutes ago | parent | next [-]

Are people usually claiming that it strictly cannot produce any output it hasn't seen before? I wouldn't agree, I mean clearly they are generating some form of new content. My argument would be that while they can learn to some extent, the power of their generalisation is still tragically weak, particularly in some domains.

qsera an hour ago | parent | prev [-]

>it puts the lie to the argument

But it does not, right? You can either show it something, or modify the parameters in a way that resemble the result of showing it something.

You can claim that the model didn't see the thing, but that would mean nothing, because you are making the same effect with parameter tweaks indirectly.