▲ | ronsor 6 days ago | |
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key. | ||
▲ | bogwog 6 days ago | parent [-] | |
If the "garbage data" is AI generated, it'll be hard or impossible to filter. |