My take: this seems excessive.

ArXiv doesn't even check the submission closely, so how can they know?

They say "errors, mistakes"

They use an automated system to check if the basic requirements were met, and sometimes papers are flagged for further superficial human review, but there is no way they can possibly do this at scale or check every reference. This would be like trying to do peer review, but for a preprint archive that gets easily 100x more volume than any journal.

Second, there is such a huuuuge gap between publishing on arvix and peer review. I can attest personally that it's not even close. I've gotten probably dozen rejections from peer review and no problems publishing in arxiv math. This is because peer review checks not just for if something is new or correct, but also if it's of "interest to math community," which is inherently subjective, but also makes peer review many magnitudes harder than publishing on arxiv.

Even when a well-known professor in number theory praised the paper when I got an endorsement and a second emailed me and and encouraged me to publish it, it still got rejected 3 times and still waiting.

Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.

▲

helterskelter an hour ago | parent [-]

You could at least filter out hallucinated references which simply don't exist pretty trivially, I'd imagine.

▲

paulpauper an hour ago | parent [-]

It's more than that. if there are mistakes, then you can also be flagged.

read the whole tweet:

If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).

	▲	nutjob2 an hour ago \| parent [-]
		If you'd read the whole series of tweets it's obvious that is not their intention and there needs to be "incontrovertible evidence that the authors did not check the results of LLM generation" for the penalty to apply. It's not hard to divine their intentions: you are entirely responsible for what you summit and if it's clearly slop(py) you get a ban. In a reply they state that they are seeking to apply this rule fairly and accurately and are mindful of unintended effects.