Remix.run Logo
abound 2 days ago

This intuitively makes sense (like deep-frying a JPEG), but it doesn't seem to happen in practice, as modern models are frequently trained on text both output from other models, and curated from other models.

Realistically, going forward model training will just need to incorporate a step to remove data below some quality threshold, LLM-generated or otherwise.