Remix.run Logo
hansmayer 2 days ago

Did they start correctly counting the number of 'R's in 'strawberry'?

SkiFire13 2 days ago | parent | next [-]

Most likely yes, that prompt has been repeated too many times online for LLMs not to pick up the right answer (or be specificlly trained on it!). You'll have to try with a different word to make them fail.

hansmayer 2 days ago | parent [-]

Well that's kind of the problem though, isn't it? All that effort for the machine to sometimes correctly draw the regression line between the right and wrong answers in order to solve a trivial problem. A 6-year old kid would only need to learn the alphabet before being able to count it all on their own. Do we even realise how ridiculous these 'successes' sound? So a machine we have to "train" how to count the letters, is supposed to take over the work which is orders of magnitude more complex? It's a classic solution looking for a problem, if I've ever seen one.

pulvinar 2 days ago | parent | prev | next [-]

Not as long as they use tokens -- it's a perception limitation of theirs. Like our blind spot, or the Muller-Lyer illusion, or the McGurk effect, etc.

comeonbro 2 days ago | parent | prev [-]

Imagine if I asked you how many '⊚'s are in 'Ⰹ⧏⏃'? (the answer is 3, because there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃)

Much harder question than if I asked you how many '⟕'s are in 'Ⓕ⟕⥒⟲⾵⟕⟕⢼' (the answer is 3, because there are 3 ⟕s there)

You'd need to read through like 100,000x more random internet text to infer that there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃ (when this is not something that people ever explicitly talk about), than you would need to to figure out that there are 3 ⟕s when 3 ⟕s appear, or to figure out from context clues that Ⰹ⧏⏃s are red and edible.

The former is how tokenization makes 'strawberry' look to LLMs: https://i.imgur.com/IggjwEK.png

It's a consequence of an engineering tradeoff, not a demonstration of a fundamental limitation.

hansmayer a day ago | parent [-]

I get the technical challenge. It's just that a system that has to be trained with Petabytes of data, just to (sometimes) correctly solve a problem which a six-seven year old kid is able to solve after learning to spell, may not be the right solution to the problem at hand? Haven't the MBAs been shoving it down our throats that all cost-ineffective solutions have to go? Why are we burning hundreds of billion of dollars into development of tools whose most common use-case (or better said: plea by the VC investors) is a) summarising emails (I am not an idiot who cannot read) b) writing emails (really, I know how to write too, and can do it better) . The only use-case where they are sometimes useful is taking out the boring parts of software development, because of the relatively closed learning context, and as someone who used them for over a year for this, they are not reliable and have to be double-checked, lest you want to introduce more issues in your codebase.

comeonbro 33 minutes ago | parent [-]

It's not a technical challenge in this case, it's a technical tradeoff. You could train an LLM with single characters as the atomic unit and it would be able to count the 'r's in 'strawberry' no problem. The tradeoff is that then processing the word 'strawberry' would then be 10 sequential steps, 10 complete runs through the entire LLM, where one has to finish before you can start the next one.

Instead, they're almost always trained with (what we see as, but they literally do not) multi-character tokens as the atomic unit, so 'strawberry' is spelled 'Ⰹ⧏⏃'. Processing that is only 3 sequential steps, only 3 complete runs through the entire LLM. But it needs to encounter enough relevant text in training to be able to figure out that 'Ⰹ' somehow has 1 'r' in it, '⧏' has 0 'r's, and '⏃' has 2 'r's, which really not a lot of text demonstrates, to be able to count the 'r's in 'Ⰹ⧏⏃ correctly.

The tradeoff in this is everything being 3-5x slower and more expensive (but you can count the 'r's in 'strawberry'), vs, basically only, being bad at character-level tasks like counting letters in words.

Easy choice, but leads to this stupid misundertanding being absolutely everywhere and just by itself doing an enormous amount of damage to peoples' ability to understand what is happening and about to happen.