Remix.run Logo
chrisjj 10 hours ago

So, a small proportion of articles were detected as bot-written, and a large proportion of those failed validation.

What if in fact a large proportion of articles were bot-written, but only the unverifiable ones were bad enough to be detected?

EdwardDiego 9 hours ago | parent [-]

Human editors, I suspect, would pick up the "tells" of generated text, although as we know, there's a lot of false positives in that space.

But it looks like Pangram is a text classifying NN trained using a technique where they get a human to write a body of text on a subject, and then get various LLMs to write a body of text on the same subject, which strikes me as a good way to approach the problem. Not that I'm in anyway qualified to properly understand ML.

More details here: https://arxiv.org/pdf/2402.14873