| ▲ | StephenHerlihyy 4 hours ago | |
I don’t know why anyone would still be trying to pull data off the open internet. Too much signal to noise. So much AI influence already baked into the corpus. You are just going to be reinforcing existing bias. I’m more worried about the day Amazon or Hugging Face take down their large data sets. | ||
| ▲ | saaaaaam 4 hours ago | parent [-] | |
MetaBrainz is a fairly valuable “high signal” dataset though. | ||