| ▲ | parineum 5 hours ago | |
I've had a suspicion for a bit that, since a large portion of the Internet is English and Chinese, that any other languages would have a much larger ratio of training material come from books. I wouldn't be surprised if Arabic in particular had this issue and if Arabic also had a disproportionate amount of religious text as source material. I bet you'd see something similar with Hebrew. | ||
| ▲ | mentalgear an hour ago | parent | next [-] | |
I think therein lies another fun benchmark to show that LLM don't generalize: ask the llm to solve the same logic riddle, only in different languages. If it can solve it in some languages, but not in others, it's a strong argument for just straightforward memorization and next token prediction vs true generalization capabilities. | ||
| ▲ | eshaham78 2 hours ago | parent | prev [-] | |
[dead] | ||