Remix.run Logo
sciencejerk 5 hours ago

I wonder if any poisoned data made it into LLM training data pipelines?

ibejoeb 5 hours ago | parent [-]

Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.