Remix clone Hacker News

new | show | ask | jobs Github

	▲	parineum 5 hours ago
		I've had a suspicion for a bit that, since a large portion of the Internet is English and Chinese, that any other languages would have a much larger ratio of training material come from books. I wouldn't be surprised if Arabic in particular had this issue and if Arabic also had a disproportionate amount of religious text as source material. I bet you'd see something similar with Hebrew.
	▲	mentalgear an hour ago \| parent \| next [-]
		I think therein lies another fun benchmark to show that LLM don't generalize: ask the llm to solve the same logic riddle, only in different languages. If it can solve it in some languages, but not in others, it's a strong argument for just straightforward memorization and next token prediction vs true generalization capabilities.
	▲	eshaham78 2 hours ago \| parent \| prev [-]
		[dead]