| ▲ | RandomGerm4n 4 hours ago | |
That reasoning is complete nonsense. If someone scrapes the data, that’s not a big deal, and Reddit has no more claim to the data than anyone else. The data is public and created by users, not by the platform itself. If the traffic is too much for them, they could use rate limiting or simply offer an archive of the data as a torrent once a month. That way, a company training an LLM could access a complete dataset without generating a lot of traffic. | ||