Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?

▲

miniwark a day ago | parent | next [-]

They explain in the article what they consider a proper citation, an erroneous one and an hallucination, in the section "Defining Hallucitations". They also say than they have many false positives, mostly real papers who are not available online.

Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.

▲

sigmoid10 a day ago | parent [-]

If you look at their examples in the "Defining Hallucitations" section, I'd say those could be 100% human errors. Shortening authors' names, leaving out authors, misattributing authors, misspelling or misremembering the paper title (or having an old preprint-title, as titles do change) are all things that I would fully expect to happen to anyone in any field were things get ever got published. Modern tools have made the citation process more comfortable, but if you go back to the old days, you'd probably find those kinds of errors everywhere. If you look at the full list of "hallucinations" they claim to have discovered, the only ones I'd not immediately blame on human screwups are the ones where a title and the authors got zero matches for existing papers/people. If you really want to do this kind of analysis correctly, you'd have to match the claim of the text and verify it with the cited article. Because I think it would be even more dangerous if you can get claims accepted by simply quoting an existing paper correctly, while completely ignoring its content (which would have worked here).

▲

Majromax a day ago | parent | next [-]

> Modern tools have made the citation process more comfortable,

That also makes some of those errors easier. A bad auto-import of paper metadata can silently screw up some of the publication details, and replacing an early preprint with the peer-reviewed article of record takes annoying manual intervention.

▲

mike_hearn 15 hours ago | parent | prev | next [-]

There are other issues. In January they claimed that a US health report contained "fabricated" and "AI generated" citations with the headline being a claim from a Cigna Group report. Their claim it's fabricated is based on nothing more than the URL now being a redirect of the type common in corporate website reorgs.

I did some checking and found the report does exist, but the citation is still not quite correct. Then I discovered someone is running some LLM based citation checker already, which already fact checked this claim and did a correct writeup that seems a lot better than what this GPTZero tool does.

https://checkplease.neocities.org/maha/html/17-loneliness-73...

The mistakes in the citation are the sort of mistake that could have been made by both a human or an AI, really. The visualization in the report is confusing and does contain the 73% number (rounded up), but it's unclear how to interpret the numbers because it's some sort of "vitality index" and not what you'd expect based on how it's introduced. At first glance I actually mis-interpreted it the same way the report does, so it's hard to view this is as clear evidence of AI misuse. Yet the GPTZero folks do make very strong claims based on nothing more than a URL scraper script.

▲

15 hours ago | parent | prev | next [-]

[deleted]

▲

jameshart a day ago | parent | prev [-]

I mean, if you’re able to take the citation, find the cited work, and definitively state ‘looks like they got the title wrong’ or ‘they attributed the paper to the wrong authors’, that doesn’t sound like what people usually mean when they say a ‘hallucinated’ citation. Work that is lazily or poorly cited but nonetheless attempts to cite real work is not the problem. Work which gives itself false authority by claiming to cite works that simply do not exist is the main concern surely?

	▲	sigmoid10 a day ago \| parent [-]
		>Work which gives itself false authority by claiming to cite works that simply do not exist is the main concern surely? You'd think so, but apparently it isn't for these folks. On the other hand, saying "we've found 50 hallucinations in scientific papers" generates a lot more clicks than "we've found 50 common citation mistakes that people make all the time"

▲

_alternator_ a day ago | parent | prev | next [-]

Let me second this: a baseline analysis should include papers that were published or reviewed at least 3-4 years ago.

When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.

For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.

▲

currymj a day ago | parent | prev | next [-]

the papers themselves are publicly available online too. Most of the ones I spot-checked give the extremely strong impression of AI generation.

not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.

To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.

▲

llm_nerd a day ago | parent | prev | next [-]

People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.

▲

pmontra a day ago | parent | next [-]

Fabricated citations are not errors.

A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.

A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.

▲

mapmeld a day ago | parent | next [-]

Further, if I use AI-written citations to back some claim or fact, what are the actual claims or facts based on? These started happening in law because someone writes the text and then wishes there was a source that was relevant and actually supportive of their claim. But if someone puts in the labor to check your real/extant sources, there's nothing backing it (e.g. MAHA report).

▲

llm_nerd a day ago | parent | prev [-]

>Fabricated citations are not errors.

Interesting that you hallucinated the word "fabricated" here where I broadly talked about errors. Humans, right? Can't trust them.

Firstly, just about every paper ever written in the history of papers has errors in it. Some small, some big. Most accidental, but some intentional. Sometimes people are sloppy keeping notes, transcribe a row, get a name wrong, do an offset by 1. Sometimes they just entirely make up data or findings. This is not remotely new. It has happened as long as we've had papers. Find an old, pre-LLM paper and go through the citations -- especially for a tosser target like this where there are tens of thousands of low effort papers submitted -- and you're going to find a lot of sloppy citations that are hard to rationalize.

Secondly, the "hallucination" is that this particular snake-oil firm couldn't find given papers in many cases (they aren't foolish enough to think that means they were fabricated. But again, they're looking to sell a tool to rubes, so the conclusion is good enough), and in others that some of the author names are wrong. Eh.

	▲	the_af a day ago \| parent [-]
		> Firstly, just about every paper ever written in the history of papers has errors in it LLMs make it easier and faster, much like guns make killing easier and faster.

▲

nkrisc a day ago | parent | prev | next [-]

Under what circumstances would a human mistakenly cite a paper which does not exist? I’m having difficulty imagining how someone could mistakenly do that.

	▲	jameshart a day ago \| parent [-]
		The issue here is that many of the ‘hallucinations’ this article cites aren’t ’papers which do not exist’. They are incorrect author attributions, publication dates, or titles.

▲

the_af a day ago | parent | prev | next [-]

LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.

▲

llm_nerd a day ago | parent [-]

>It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.

1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.

▲

the_af a day ago | parent [-]

I've zero interest in the AI tool, I'm discussing the broader problem.

The references were made up, and this is easier and faster to do with LLMs than with humans. Easier to do inadvertently, too.

As I said, LLMs are a force multiplier for fraud and inadvertent errors. So it's a big deal.

	▲	throwaway-0001 a day ago \| parent [-]
		I think we should see a chart as % of “fabricated” references from past 20 years. We should see a huge increase after 2020-2021. Anyone has this chart data?

▲

add-sub-mul-div a day ago | parent | prev [-]

Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.

> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.

	▲	llm_nerd a day ago \| parent [-]
		[flagged]

▲

tokai a day ago | parent | prev [-]

Yeah that is what their tool does.