Remix.run Logo
ronsor 6 days ago

That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.

bogwog 6 days ago | parent [-]

If the "garbage data" is AI generated, it'll be hard or impossible to filter.