Remix.run Logo
PaulRobinson 12 hours ago

It's been alleged that a major source of training data for many LLMs was libgen and SciHub - hardly casual.

maxbond 12 hours ago | parent [-]

Even if that were comparable in size to the conversational Internet, how many novels and academic papers have you read that used multiple "not just A, but B" constructions in a single chapter/paper (that were not written by/about AI)?