Remix.run Logo
FusionX 5 days ago

They partly address this near the end

> It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

> The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.