That's exactly why text written before the first LLMs has a premium on it these days. So no, all major models suffer from slop in their training data.