There is knowledge of correct and incorrect, that’s what loss is, there are just often many possible answers to a question.

This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)

▲

Jensson 5 days ago | parent [-]

> There is knowledge of correct and incorrect, that’s what loss is

Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.

So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.

	▲	mountainriver 5 days ago \| parent [-]
		We do with RLVR and that works, there is only one answer, it has to find it. LLMs are often also trained on factual information, and tested on that. If the knowledge can be represented in text then they can learn it, if it can't then we need a multimodal model.