> is the consensus of many human experts as encoded in its embedding

That’s not true.

Yup, current LLMs are trained on the best and the worst we can offer. I think there's value in training smaller models with strictly curated datasets, to guarantee they've learned from trustworthy sources.

▲

chasd00 17 hours ago | parent [-]

> to guarantee they've learned from trustworthy sources.

i don't see how this will every work. Even in hard science there's debate over what content is trustworthy and what is not. Imagine trying to declare your source of training material on religion, philosophy, or politics "trustworthy".

	▲	ASalazarMX 15 hours ago \| parent [-]
		"Sir, I want an LLM to design architecture, not to debate philosophy." But really, you leave the curation to real humans, institutions with ethical procedures already in place. I don't want Goole or Elon dictating what truth is, but I wouldn't mind if NASA or other aerospace institutions dictated what is truth in that space. Of course, the dataset should have a list of every document/source used, so others can audit it. I know, unthinkable in this corporate world, but one can dream.