Remix.run Logo
bwfan123 2 days ago

We may enter a vicious loop where writing is increasingly generated by LLMs. Then, LLMs have to train on their own output leading to model collapse.

Hence, the models depend on human writing.

abound 2 days ago | parent [-]

This intuitively makes sense (like deep-frying a JPEG), but it doesn't seem to happen in practice, as modern models are frequently trained on text both output from other models, and curated from other models.

Realistically, going forward model training will just need to incorporate a step to remove data below some quality threshold, LLM-generated or otherwise.