Remix clone Hacker News

new | show | ask | jobs Github

	▲	sciencejerk 5 hours ago
		I wonder if any poisoned data made it into LLM training data pipelines?
	▲	ibejoeb 5 hours ago \| parent [-]
		Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.