Remix clone Hacker News

new | show | ask | jobs Github

	▲	noosphr 5 hours ago
		Or they could look at the past few centuries of language theory and start crafting better tokenizers with inductive biases. We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.
	▲	retsibsi 4 hours ago \| parent \| next [-]
		> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language Can you elaborate? I think you're talking about https://github.com/PastaPastaPasta/llm-chinese-english , but I read those findings as far more nuanced and ambiguous than what you seem to be claiming here.
	▲	umanwizard 4 hours ago \| parent \| prev [-]
		> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems. Post a link because until you do, I’m almost certain this is pseudoscientific crankery. Chinese characters are not an “iron age ontology of meaning” nor anything close to that. Also please cite the specific results in centuries-old “language theory” that you’re referring to. Did Saussure have something to say about LLMs? Or someone even older?