>a bunch of internet pages containing things that are blatantly wrong
So Reddit?
I’d imagine the AI companies have all the “pre AI internet” data they scraped very carefully catalogued.