Remix.run Logo
hyperhello 4 hours ago

From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.

medi8r 4 hours ago | parent [-]

A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)

Lerc 3 hours ago | parent | next [-]

Notably the difference is that ten digits is not the same thing as a number. One might say that turning it into a number might be the first step, but Neural nets being what they are, they are liable to produce the correct result without bothering to have a representation any more pure than a list of digits.

I guess the analogy there is that a 74ls283 never really has a number either and just manipulates a series of logic levels.

Filligree 3 hours ago | parent | prev [-]

So the question is, why do we tokenise it in such a way that it makes everything harder?