Remix clone Hacker News

new | show | ask | jobs Github

	▲	aminerj 3 hours ago
		On the reference point: the poisoning succeeded even when the legitimate document was present in the retrieved chunks and visible in the context. The LLM saw all three sources simultaneously, including the correct $24.7M figure, and still produced the fabricated answer because the poisoned documents framed the legitimate one as a known error. Providing a reference to the retrieved chunks doesn't help if the retrieved chunks themselves are the attack surface. zenoprax's point about ignorant employees is also worth taking seriously. "Write access to the knowledge base" in practice means anyone who can edit a Confluence page, commit to a docs repo, or submit a support ticket that gets ingested. That's not critical access in most organizations.