Remix.run Logo
zzzeek a day ago

correct me if I'm wrong but citations in papers follow a specific format, and the case here is that a tool was used to validate that they are all real. Certainly a tool that scans a paper for all citations and verifies that they actually exist in the journals they reference shouldn't be all that technically difficult to achieve?

alexcdot a day ago | parent | next [-]

There are a ton of edge cases and a bit of contextual understanding for what is a hallucinated citation (i.e. what if its republished from arxiv to ICLR?)

But to your point, seems we need a tool that can do this

mike_hearn 16 hours ago | parent | prev [-]

It's not, there's lots of ways to resolve citations without even using AI.

I experimented a couple of years ago with getting LLMs to check citations but stopped working on it because there's no incentive. You could run a fancy expensive pipeline burning scarce GPU hours and find a bunch of bad citations. Then what? Nobody cares. No journal is going to retract any of these papers, the academics themselves won't care or even respond to your emails, nobody is willing to pay for this stuff, least of all the universities, journals or governments themselves.

For example, there's a guy in France who runs a pre-LLM pipeline to discover bad papers using hand-coded heuristics like regexs or metadata analysis e.g. checking if a citation has been retracted. Many of the things it detects are plagiarism, paper mills (i.e. companies that sell fake papers to academics for a profit), or the result of joke paper creators like SciGen.

https://dbrech.irit.fr/pls/apex/f?p=9999:1::::::

Other than populating an obscure database nobody knows about, this work achieved bupkis.