▲ | bjackman 6 days ago | |
Well, you are just directly contradicting the concrete claims made by the post so one of you is wrong... FWIW my interpretation of this is that the hallucination vector encodes the behaviour that a the model produces bullshit despite having the facts of the matter encoded in its weights. Which is slightly different than producing bullshit as a substitute for information that it "doesn't know". And presumably there is a second-order property here where the minimal amount of hallucination is not only bounded by the model's "knowledge" but also its implicit "meta-knowledge", i.e. the "accuracy of the hallucination vector". |