Remix.run Logo
dmortin 2 hours ago

There should be at least some correlation. When building the model they give more weight to some pages (e.g. Wikipedia) which have bigger trust (pagerank?). And when they provide links in answers, those matches are listed first which have better pagerank for the query.

So if it sources something in Wikipedia, it is more likely to provide Wikipedia as a trusted source for it.

The problem is when an answer is hallucinated, false, it may provide a source for it which contains the invalid info.

MelonUsk 2 hours ago | parent [-]

Yep, a few non-profits work on direct training data attribution:

OlmoTrace, Guide Labs with Clarity and a few more

Labs train the model with attribution baked-in and they say the bigger the model - the more interpretable it becomes

Pretty sure it’s the future