| ▲ | phatfish 2 hours ago | |
Isn't the non-LLM generated text becoming more valuable for training as the web at large is flooded with slop? Preventing new human generated text from being used by AI firms (without consent) seems like a valid strategy. | ||
| ▲ | tossandthrow an hour ago | parent [-] | |
No. Modern LLMs are trained on a large percentage of synthetic data. This sentiment is largely legacy (even though just a couple of years old). | ||