| ▲ | Lerc 4 hours ago | |
Where from? And quite frankly why? There are existing training data sets that are large enough for smaller models. Larger models have been focusing on data quality more than quantity. There's limited utility to further indiscriminate widespread scraping, | ||
| ▲ | danaris 3 hours ago | parent [-] | |
Tell that to the idiots doing the scraping. Small site operators like us know very well that the utility they can get by scraping us is marginal at best. Based on their patterns of behavior, though, my best guess is that they've simply configured their bots to scrape absolutely everything, all the time, forever, as aggressively as possible, and treat any attempt to indicate "hey, this data isn't useful to you" as an adversarial signal that the site operator is trying to hide things from them that are their God-given right. | ||