Remix.run Logo
llm_nerd a day ago

People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.

pmontra a day ago | parent | next [-]

Fabricated citations are not errors.

A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.

A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.

mapmeld a day ago | parent | next [-]

Further, if I use AI-written citations to back some claim or fact, what are the actual claims or facts based on? These started happening in law because someone writes the text and then wishes there was a source that was relevant and actually supportive of their claim. But if someone puts in the labor to check your real/extant sources, there's nothing backing it (e.g. MAHA report).

llm_nerd a day ago | parent | prev [-]

>Fabricated citations are not errors.

Interesting that you hallucinated the word "fabricated" here where I broadly talked about errors. Humans, right? Can't trust them.

Firstly, just about every paper ever written in the history of papers has errors in it. Some small, some big. Most accidental, but some intentional. Sometimes people are sloppy keeping notes, transcribe a row, get a name wrong, do an offset by 1. Sometimes they just entirely make up data or findings. This is not remotely new. It has happened as long as we've had papers. Find an old, pre-LLM paper and go through the citations -- especially for a tosser target like this where there are tens of thousands of low effort papers submitted -- and you're going to find a lot of sloppy citations that are hard to rationalize.

Secondly, the "hallucination" is that this particular snake-oil firm couldn't find given papers in many cases (they aren't foolish enough to think that means they were fabricated. But again, they're looking to sell a tool to rubes, so the conclusion is good enough), and in others that some of the author names are wrong. Eh.

the_af a day ago | parent [-]

> Firstly, just about every paper ever written in the history of papers has errors in it

LLMs make it easier and faster, much like guns make killing easier and faster.

nkrisc a day ago | parent | prev | next [-]

Under what circumstances would a human mistakenly cite a paper which does not exist? I’m having difficulty imagining how someone could mistakenly do that.

jameshart a day ago | parent [-]

The issue here is that many of the ‘hallucinations’ this article cites aren’t ’papers which do not exist’. They are incorrect author attributions, publication dates, or titles.

the_af a day ago | parent | prev | next [-]

LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.

llm_nerd a day ago | parent [-]

>It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.

1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.

the_af a day ago | parent [-]

I've zero interest in the AI tool, I'm discussing the broader problem.

The references were made up, and this is easier and faster to do with LLMs than with humans. Easier to do inadvertently, too.

As I said, LLMs are a force multiplier for fraud and inadvertent errors. So it's a big deal.

throwaway-0001 a day ago | parent [-]

I think we should see a chart as % of “fabricated” references from past 20 years. We should see a huge increase after 2020-2021. Anyone has this chart data?

add-sub-mul-div a day ago | parent | prev [-]

Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.

> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.

llm_nerd a day ago | parent [-]

[flagged]