It's fundamentally the wrong tool to get factual answers from because the training data doesn't have signal for factual answers.

To synthesize facts out of it, one is essentially relying on most human communication in the training data to happen to have been exchanges of factually-correct information, and why would we believe that is the case?

▲

astrange 5 days ago | parent | next [-]

Because people are paying the model companies to give them factual answers, so they hire data labellers and invent verification techniques to attempt to provide them.

Even without that, there's implicit signal because factual helpful people have different writing styles and beliefs than unhelpful people, so if you tell the model to write in a similar style it will (hopefully) provide similar answers. This is why it turns out to be hard to produce an evil racist AI that also answers questions correctly.

▲

lblume 6 days ago | parent | prev [-]

Empirically, there seems to be strong evidence for LLMs giving factual output for accessible knowledge questions. Many benchmarks test this.

	▲	shadowgovt 6 days ago \| parent [-]
		Yes, but in the same sense that empirically, I can swim in the nearby river most days; the fact that the city has a combined stormdrain / sewer system that overflows to put feces in the river means that some days, the water I'd swim in is full of shit, and nothing about the infrastructure is guarding against that happening. I can tell you how quickly "swimmer beware" becomes "just stay out of the river" when potential E. coli infection is on the table, and (depending on how important the factuality of the information is) I fully understand people being similarly skeptical of a machine that probably isn't outputting shit, but has nothing in its design to actively discourage or prevent it.