▲ | novok 7 days ago | |
So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive | ||
▲ | 1gn15 6 days ago | parent [-] | |
Oh, yeah no you can't. The data has to be obtained legally. Common crawl and the Reddit archives should be fine though. TOSes don't count. |