EDIT: probably not relevant, after re-re-reading the comment in question.
Presumably littlestymaar is talking about all the LLM-generated output that's publicly available on the Internet (in various qualities but significant quantity) and there for the scraping.