Remix.run Logo
aminerj 3 hours ago

sidrag22 is right on the technical separation. The more interesting question for this specific attack is whether provenance metadata changes model behavior at generation time, not just provides an audit trail after the fact.

In my testing, the poisoned documents were more authoritative-sounding than the legitimate one — "CFO-approved correction", "board-verified restatement" vs. a plain financial summary. The legitimate document had no authority signals at all. If chunk metadata included "source: finance-system, ingested: 2024-Q1, author: cfo-office@company.com" surfaced directly in the prompt context, the model has something to reason about rather than just comparing document rhetoric.