Remix.run Logo
novok 7 days ago

So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive

1gn15 6 days ago | parent [-]

Oh, yeah no you can't. The data has to be obtained legally. Common crawl and the Reddit archives should be fine though. TOSes don't count.