Based on the code that it's good at, and the code that it's terrible at, you are exactly right about LLMs being shaped by their training material. If this is a fundamental limitation I really don't see general purpose LLMs progressing beyond their current status is idiot savants. They are confident in the face of not knowing what they don't know.

Your experience with Arabic in particular makes me think there's still a lot of training material to be mined in languages other than English. I suspect the reason that Arabic sounds 20 years ago is that there's a data labeling bottleneck in using foreign language material.

▲

parineum 5 hours ago | parent [-]

I've had a suspicion for a bit that, since a large portion of the Internet is English and Chinese, that any other languages would have a much larger ratio of training material come from books.

I wouldn't be surprised if Arabic in particular had this issue and if Arabic also had a disproportionate amount of religious text as source material.

I bet you'd see something similar with Hebrew.

	▲	mentalgear an hour ago \| parent \| next [-]
		I think therein lies another fun benchmark to show that LLM don't generalize: ask the llm to solve the same logic riddle, only in different languages. If it can solve it in some languages, but not in others, it's a strong argument for just straightforward memorization and next token prediction vs true generalization capabilities.
	▲	eshaham78 2 hours ago \| parent \| prev [-]
		[dead]