Remix.run Logo
GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers(gptzero.me)
303 points by segmenta 2 hours ago | 177 comments
cogman10 2 hours ago | parent | next [-]

Yuck, this is going to really harm scientific research.

There is already a problem with papers falsifying data/samples/etc, LLMs being able to put out plausible papers is just going to make it worse.

On the bright side, maybe this will get the scientific community and science journalists to finally take reproducibility more seriously. I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

vld_chk 37 minutes ago | parent | next [-]

In my mental model, the fundamental problem of reproducibility is that scientists have very hard time to find a penny to fund such research. No one wants to grant “hey I need $1m and 2 years to validate the paper from last year which looks suspicious”.

Until we can change how we fund science on the fundamental level; how we assign grants — it will be indeed very hard problem to deal with.

parpfish 25 minutes ago | parent | next [-]

In theory, asking grad students and early career folks to run replications would be a great training tool.

But the problem isn’t just funding, it’s time. Successfully running a replication doesn’t get you a publication to help your career.

iugtmkbdfil834 2 minutes ago | parent [-]

Yeah, but doesn't publishing an easily falsifiable paper end one?

poszlem 21 minutes ago | parent | prev [-]

I often think we should movefrom peer review as "certification" to peer review as "triage", with replication determining how much trust and downstream weight a result earns over time.

StableAlkyne an hour ago | parent | prev | next [-]

> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.

Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.

If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.

So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.

And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.

There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.

gcr 34 minutes ago | parent | next [-]

I'd personally like to see top conferences grow a "reproducibility" track. Each submission would be a short tech report that chooses some other paper to re-implement. Cap 'em at three pages, have a lightweight review process. Maybe there could be artifacts (git repositories, etc) that accompany each submission.

This would especially help newer grad students learn how to begin to do this sort of research.

Maybe doing enough reproductions could unlock incentives. Like if you do 5 reproductions than the AC would assign your next paper double the reviewers. Or, more invasively, maybe you can't submit to the conference until you complete some reproduction.

maerF0x0 an hour ago | parent | prev | next [-]

> The challenge is there really isn't a good way to incentivize that work.

What if we got Undergrads (with hope of graduate studies) to do it? Could be a great way to train them on the skills required for research without the pressure of it also being novel?

StableAlkyne an hour ago | parent | next [-]

Those undergrads still need to be advised and they use lab resources.

If you're a tenure-track academic, your livelihood is much safer from having them try new ideas (that you will be the corresponding author on, increasing your prestige and ability to procure funding) instead of incrementing.

And if you already have tenure, maybe you have the undergrad do just that. But the tenure process heavily filters for ambitious researchers, so it's unlikely this would be a priority.

If instead you did it as coursework, you could get them to maybe reproduce the work, but if you only have the students for a semester, that's not enough time to write up the paper and make it through peer review (which can take months between iterations)

suddenlybananas an hour ago | parent | prev [-]

Unfortunately, that might just lead to a bunch of type II errors instead, if an effect requires very precise experimental conditions that undergrads lack the expertise for.

poulpy123 an hour ago | parent | prev | next [-]

> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

But nobody want to pay for it

MetaWhirledPeas 24 minutes ago | parent | prev | next [-]

> Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations."

It's the Google search algorithm all over again. And it's the certificate trust hierarchy all over again. We keep working on the same problems.

Like the two cases I mentioned, this is a matter of making adjustments until you have the desired result. Never perfect, always improving (well, we hope). This means we need liquidity with the rules and heuristics. How do we best get that?

sroussey 3 minutes ago | parent [-]

Incentives.

First X people that reproduce Y get Z percent of patent revenue.

Or something similar.

geokon an hour ago | parent | prev | next [-]

usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.

sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.

thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem

gcr 32 minutes ago | parent [-]

It's often quite common to see a citation say "BTW, we weren't able to reproduce X's numbers, but we got fairly close number Y, so Table 1 includes that one next to an asterisk."

The difficult part is surfacing that information to readers of the original paper. The semantic scholar people are beginning to do some work in this area.

graemep 22 minutes ago | parent | prev | next [-]

> you have to measure productivity somehow,

No, you do not have to. You give people with the skills and interest in doing research the money. You need to ensure its spent correctly, that is all. People will be motivated by wanting to build a reputation and the intrinsic reward of the work

warkdarrior an hour ago | parent | prev | next [-]

> If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk.

This is exactly what rewarding replication papers (that reproduce and confirm an existing paper) will lead to.

pixl97 an hour ago | parent [-]

And yet if we can't reproduce an existing paper, it's very possible that existing paper is junk itself.

Catch-22 is a fun game to get caught in.

jimbokun an hour ago | parent | prev [-]

> The challenge is there really isn't a good way to incentivize that work.

Ban publication of any research that hasn't been reproduced.

wpollock 42 minutes ago | parent | next [-]

> Ban publication of any research that hasn't been reproduced.

Unless it is published, nobody will know about it and thus nobody will try to reproduce it.

sroussey 2 minutes ago | parent [-]

Just have a new journal of only papers that have been reproduced, and include the reproduction papers.

gcr 39 minutes ago | parent | prev [-]

lol, how would the first paper carrying some new discovery get published?

mike_hearn an hour ago | parent | prev | next [-]

Reproducibility is overrated and if you could wave a wand to make all papers reproducible tomorrow, it wouldn't fix the problem. It might even make it worse.

https://blog.plan99.net/replication-studies-cant-fix-science...

biophysboy 31 minutes ago | parent [-]

? More samples reduces the variance of a statistic. Obviously it cannot identify systematic bias in a model, or establish causality, or make a "bad" question "good". Its not overrated though -- it would strengthen or weaken the case for many papers.

lxgr 13 minutes ago | parent | prev | next [-]

> LLMs being able to put out plausible papers is just going to make it worse

If correct form (LaTeX two-column formatting, quoting the right papers and authors of the year etc.) has been allowing otherwise reject-worthy papers to slip through peer review, academia arguably has bigger problems than LLMs.

godzillabrennus 2 hours ago | parent | prev | next [-]

Have they solved the issue where papers that cite research already invalidated are still being cited?

cogman10 2 hours ago | parent | next [-]

AFAIK, no, but I could see there being cause to push citations to also cite the validations. It'd be good if standard practice turned into something like

Paper A, by bob, bill, brad. Validated by Paper B by carol, clare, charlotte.

or

Paper A, by bob, bill, brad. Unvalidated.

gcr 2 hours ago | parent [-]

Academics typically use citation count and popularity as a rough proxy for validation. It's certainly not perfect, but it is something that people think about. Semantic Scholar in particular is doing great work in this area, making it easy to see who cites who: https://www.semanticscholar.org/

Google Scholar's PDF reader extension turns every hyperlinked citation into a popout card that shows citation counts inline in the PDF: https://chromewebstore.google.com/detail/google-scholar-pdf-...

reliabilityguy 2 hours ago | parent | prev [-]

Nope.

I am still reviewing papers that propose solutions based on a technique X, conveniently ignoring research from two years ago that shows that X cannot be used on its own. Both the paper I reviewed and the research showing X cannot be used are in the same venue!

b00ty4breakfast 2 hours ago | parent [-]

does it seem to be legitimate ignorance or maybe folks pushing ahead regardless of x being disproved?

reliabilityguy 42 minutes ago | parent | next [-]

Poor scholarship.

However, given the feedback by other reviewers, I was the only one who knew that X doesn’t work. I am not sure how these people mark themselves as “experts” in the field if they are not following the literature themselves.

freedomben 2 hours ago | parent | prev [-]

IMHO, It's mostly ignorance coming a push/drive to "publish or perish." When the stakes are so high and output is so valued, and when reproducability isn't required, it disincentivizes thorough work. The system is set up in a way that is making it fail.

There is also the reality that "one paper" or "one study" can be found contradicted almost anything, so if you just went with "some other paper/study debunks my premise" then you'd end up producing nothing. Plus many inside know that there's a lot of slop out there that gets published, so they can (sometimes reasonably IMHO) dismiss that "one paper" even when they do know about it.

It's (mostly) not fraud or malicious intent or ignorance, it's (mostly) humans existing in the system in which they must live.

f311a 2 hours ago | parent | prev | next [-]

For ML/AI/Comp sci articles, providing reproducible code is a great option. Basically, PoC or GTFO.

StableAlkyne an hour ago | parent [-]

The most annoying ones are those which discuss loosely the methodology but then fail to publish the weights or any real algorithms.

It's like buying a piece of furniture from IKEA, except you just get an Allen key, a hint at what parts to buy, and blurry instructions.

agumonkey an hour ago | parent | prev | next [-]

I think, at least I hope, that a part of the LLM value will be to create their retirement for specific needs. Instead of asking it to solve any problem, restrict the space to a tool that can help you then reach your goal faster without the statistical nature of LLMs.

benob 36 minutes ago | parent | prev | next [-]

Maybe it will also change the whole publication as evaluation of science.

j45 2 hours ago | parent | prev [-]

It will better expose the behaviour of false scientists.

gcr 2 hours ago | parent | prev | next [-]

NeurIPS leadership doesn’t think hallucinated references are necessarily disqualifying; see the full article from Fortune for a statement from them: https://archive.ph/yizHN

> When reached for comment, the NeurIPS board shared the following statement: “The usage of LLMs in papers at AI conferences is rapidly evolving, and NeurIPS is actively monitoring developments. In previous years, we piloted policies regarding the use of LLMs, and in 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that significantly more effort is required to determine the implications. Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference). As always, NeurIPS is committed to evolving the review and authorship process to best ensure scientific rigor and to identify ways that LLMs can be used to enhance author and reviewer capabilities.”

jklinger410 2 hours ago | parent | next [-]

> the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference)

Maybe I'm overreacting, but this feels like an insanely biased response. They found the one potentially innocuous reason and latched onto that as a way to hand-wave the entire problem away.

Science already had a reproducibility problem, and it now has a hallucination problem. Considering the massive influence the private sector has on the both the work and the institutions themselves, the future of open science is looking bleak.

paulmist an hour ago | parent | next [-]

Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh? I don't think they'd be okay with references that are actually made up.

jklinger410 20 minutes ago | parent | next [-]

When your entire job is confirming that science is valid, I expect a little more humility when it turns out you've missed a critical aspect.

How did these 100 sources even get through the validation process?

> Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh?

It will serve as a reminder not to cut any corners.

zipy124 2 minutes ago | parent | prev | next [-]

Science relies on trust.. a lot. So things which show dishonesty are penalised greatly. If we were to remove trust then peer reviewing a paper might take months of work or even years.

suddenlybananas an hour ago | parent | prev [-]

It's a sign of dishonesty, not a perfect one, but an indicator.

orbital-decay an hour ago | parent | prev [-]

The wording is not hand-wavy. They said "not necessarily invalidated", which could mean that innocuous reason and nothing extra.

mikkupikku 18 minutes ago | parent | next [-]

Even if some of those innocuous mistakes happen, we'll all be better off if we accept people making those mistakes as acceptable casualties in an unforgiving campaign against academic fraudsters.

It's like arguing against strict liability for drunk driving because maybe somebody accidentally let their grape juice sit to long and they didn't know it was fermented... I can conceive of such a thing, but that doesn't mean we should go easy on drunk driving.

jklinger410 17 minutes ago | parent | prev [-]

I really think it is. The primary function of these publications is to validate science. When we find invalid citations, it shows they're not doing their job. When they get called on that, they cite the volume of work their publication puts out and call out the only potential not-disqualifying outcome.

Seems like CYA, seems like hand wave. Seems like excuses.

derf_ an hour ago | parent | prev | next [-]

This will continue to happen as long as it is effectively unpunished. Even retracting the paper would do little good, as odds are it would not have been written if the author could not have used an LLM, so they are no worse off for having tried. Scientific publications are mostly a numbers game at this point. It is just one more example of a situation where behaving badly is much cheaper than policing bad behavior, and until incentives are changed to account for that, it will only get worse.

mlmonkey an hour ago | parent | prev | next [-]

Why not run every submitted paper through GPTZero (before sending to reviewers) and summarily reject any paper with a hallucination?

gcr 42 minutes ago | parent [-]

That's how GPTZero wants to situate themselves.

Who would pay them? Conference organizers are already unpaid and undestaffed, and most conferences aren't profitable.

I think rejections shouldn't be automatic. Sometimes there are just typos. Sometimes authors don't understand BibTeX. This needs to be done in a way that reduces the workload for reviewers.

One way of doing this would be for GPTZero to annotate each paper during the review step. If reviewers could review a version of each paper with yellow-highlighted "likely-hallucinated" references in the bibliography, then they'd bring it up in their review and they'd know to be on their guard for other probably LLM-isms. If there's only a couple likely typos in the references, then reviewers could understand that, and if they care about it, they'd bring it up in their reviews and the author would have the usual opportunity to rebut.

I don't know if GPTZero is willing to provide this service "for free" to the academic community, but if they are, it's probably worth bringing up at the next PAMI-TC meeting for CVPR.

Aurornis an hour ago | parent | prev | next [-]

> Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated.

This statement isn’t wrong, as the rest of the paper could still be correct.

However, when I see a blatant falsification somewhere in a paper I’m immediately suspicious of everything else. Authors who take lazy shortcuts when convenient usually don’t just do it once, they do it wherever they think they can get away with it. It’s a slippery slope from letting an LLM handle citations to letting the LLM write things for you to letting the LLM interpret the data. The latter opens the door to hallucinated results and statistics, as anyone who has experimented with LLMs for data analysis will discover eventually.

empath75 2 hours ago | parent | prev | next [-]

I think a _single_ instance of an LLM hallucination should be enough to retract the whole paper and ban further submissions.

andy99 2 hours ago | parent | next [-]

   For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex
This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.
burkaman an hour ago | parent [-]

If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."

Here's a random one I picked as an example.

Paper: https://openreview.net/pdf?id=IiEtQPGVyV

Reference: Asma Issa, George Mohler, and John Johnson. Paraphrase identification using deep contextual- ized representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 517–526, 2018.

Asma Issa and John Johnson don't appear to exist. George Mohler does, but it doesn't look like he works in this area (https://www.georgemohler.com/). No paper with that title exists. There are some with sort of similar titles (https://arxiv.org/html/2212.06933v2 for example), but none that really make sense as a citation in this context. EMNLP 2018 exists (https://aclanthology.org/D18-1.pdf), but that page range is not a single paper. There are papers in there that contain the phrases "paraphrase identification" and "deep contextualized representations", so you can see how an LLM might have come up with this title.

gcr 2 hours ago | parent | prev | next [-]

Going through a retraction and blacklisting process is also a lot of work -- collecting evidence, giving authors a chance to respond and mediate discussion, etc.

Labor is the bottleneck. There aren't enough academics who volunteer to help organize conferences.

(If a reader of this comment is qualified to review papers and wants to step up to the plate and help do some work in this area, please email the program chairs of your favorite conference and let them know. They'll eagerly put you to work.)

pessimizer 2 hours ago | parent [-]

That's exactly why the inclusion of a hallucinated reference is actually a blessing. Instead going back and forth with the fraudster, just tell them to find the paper. If they can't, case closed. Massive amount of time and money saved.

gcr an hour ago | parent [-]

Isn't telling them to find the paper just "going back and forth with a fraudster"?

One "simple" way of doing this would be to automate it. Have authors step through a lint step when their camera-ready paper is uploaded. Authors would be asked to confirm each reference and link it to a google scholar citation. Maybe the easy references could be auto-populated. Non-public references could be resolved by uploading a signed statement or something.

There's no current way of using this metadata, but it could be nice for future systems.

Even the Scholar team within Google is woefully understaffed.

My gut tells me that it's probably more efficient to just drag authors who do this into some public execution or twitter mob after-the-fact. CVPR does this every so often for authors who submit the same paper to multiple venues. You don't need a lot of samples for deterrence to take effect. That's kind of what this article is doing, in a sense.

wing-_-nuts 2 hours ago | parent | prev [-]

I dunno about banning them, humans without LLMs make mistakes all the time, but I would definitely place them under much harder scrutiny in the future.

pessimizer an hour ago | parent [-]

Hallucinations aren't mistakes, they're fabrications. The two are probably referred to by the same word in some languages.

Institutions can choose an arbitrary approach to mistakes; maybe they don't mind a lot of them because they want to take risks and be on the bleeding edge. But any flexible attitude towards fabrications is simply corruption. The connected in-crowd will get mercy and the outgroup will get the hammer. Anybody criticizing the differential treatment will be accused of supporting the outgroup fraudsters.

gcr an hour ago | parent [-]

Fabrications carry intent to decieve. I don't think hallucinations necessarily do. If anything, they're a matter of negligence, not deception.

Think of it this way: if I wanted to commit pure academic fraud maliciously, I wouldn't make up a fake reference. Instead, I'd find an existing related paper and merely misrepresent it to support my own claims. That way, the deception is much harder to discover and I'd have plausible deniability -- "oh I just misunderstood what they were saying."

I think most academic fraud happens in the figures, not the citations. Researchers are more likely to to be successful at making up data points than making up references because it's impossible to know without the data files.

direwolf20 18 minutes ago | parent [-]

Generating a paper with an LLM is already academic fraud. You, the fraudster, are trying to optimize your fraud-to-effort ratio which is why you don't bother to look for existing papers to mis-cite.

Analemma_ 2 hours ago | parent | prev [-]

Kinda gives the whole game away, doesn’t it? “It doesn’t actually matter if the citations are hallucinated.”

In fairness, NeurIPS is just saying out loud what everyone already knows. Most citations in published science are useless junk: it’s either mutual back-scratching to juice h-index, or it’s the embedded and pointless practice of overcitation, like “Human beings need clean water to survive (Franz, 2002)”.

Really, hallucinated citations are just forcing a reckoning which has been overdue for a while now.

fc417fc802 an hour ago | parent | next [-]

> Most citations in published science are useless junk:

Can't say that matches my experience at all. Once I've found a useful paper on a topic thereafter I primarily navigate the literature by traveling up and down the citation graph. It's extremely effective in practice and it's continued to get easier to do as the digitization of metadata has improved over the years.

jacquesm 2 hours ago | parent | prev [-]

There should be a way to drop any kind of circular citation ring from the indexes.

gcr 2 hours ago | parent [-]

It's tough because some great citations are hard to find/procure still. I sometimes refer to papers that aren't on the Internet (eg. old wonderful books / journals).

jacquesm an hour ago | parent [-]

But that actually strengthens those citiations. The I scratch your back you scratch mine ones are the ones I'm getting at and that is quite hard to do with old and wonderful stuff, the authors there are probably not in a position to reciprocate by virtue of observing the grass from the other side.

gcr an hour ago | parent [-]

I think it's a hard problem. The semanticscholar folks are doing the sort of work that would allow them to track this; I wonder if they've thought about it.

A somewhat-related parable: I once worked in a larger lab with several subteams submitting to the same conference. Sometimes the work we did was related, so we both cited each other's paper which was also under review at the same venue. (These were flavor citations in the "related work" section for completeness, not material to our arguments.) In the review copy, the reference lists the other paper as written by "anonymous (also under review at XXXX2025)," also emphasized by a footnote to explain the situation to reviewers. When it came time to submit the camera-ready copy, we either removed the anonymization or replaced it with an arxiv link if the other team's paper got rejected. :-) I doubt this practice improved either paper's chances of getting accepted.

Are these the sorts of citation rings you're talking about? If authors misrepresented the work as if it were accepted, or pretended it was published last year or something, I'd agree with you, but it's not too uncommon in my area for well-connected authors to cite manuscripts in process. I don't think it's a problem as long as they don't lean on them.

jacquesm an hour ago | parent [-]

No, I'm talking about the ones where the citation itself is almost or even completely irrelevant and used as a way to inflate the citation count of the authors. You could find those by checking whether or not the value as a reference (ie: contributes to the understanding of the paper you are reading) is exceeded by the value of the linkage itself.

direwolf20 2 hours ago | parent | prev | next [-]

Wow! They're literally submitting references to papers by Firstname Lastname, John Doe and Jane Smith and nobody is noticing or punishing them.

emil-lp 2 hours ago | parent | next [-]

They might (I hope) still be punished after discovery.

an0malous 2 hours ago | parent | prev | next [-]

It’s the way of the future

heliumtera 2 hours ago | parent | prev [-]

Maybe "muh science" was always a fucking joke and the only difference being now we can point to an undeniable proof it is a fucking joke?

sigbottle 40 minutes ago | parent | next [-]

I'm a feyerabend sympathizer, but even he wouldn't have gone this far.

He was against establishment dogma, not pro-anti intellectualism.

azan_ 2 hours ago | parent | prev | next [-]

Yes, it only led to all advancements in the history of humanity, what a joke!

heliumtera 27 minutes ago | parent [-]

I am sure all advancements in the history of humanity was properly peer reviewed!

Including coca cola and Linux!

azan_ 19 minutes ago | parent [-]

If you wanted to attack peer review you should've attacked peer review, not entire science. And if "muh science" was some kind of code for peer review then it's not my fault that you are awful at articulating your point. It's still not clear what the hell do you mean.

biophysboy 10 minutes ago | parent | prev | next [-]

> muh science

4chan is for dimwits

Sharlin an hour ago | parent | prev [-]

Aaand "the insane take of the day" award goes to…

gcr 2 hours ago | parent | prev | next [-]

I was getting completely AI-generated reviews for a WACV publication back in 2024. The area chairs are so overworked that authors don't have much recourse, which sucks but is also really hard to handle unless more volunteers step up to the bat to help organize the conference.

(If you're qualified to review papers, please email the program chair of your favorite conference and let them know -- they really need the help!)

As for my review, the review form has a textbox for a summary, a textbox for strengths, a textbox for weaknesses, and a textbox for overall thoughts. The review I received included one complete set of summary/strengths/weaknesses/closing thoughts in the summary text box, another distinct set of summary/strengths/weaknesses/closing thoughts in the strengths, another complete and distinct review in the weaknesses, and a fourth complete review in the closing thoughts. Each of these four reviews were slightly different and contradicted each other.

The reviewer put my paper down as a weak reject, but also said "the pros greatly outweigh the cons."

They listed "innovative use of synthetic data" as a strength, and "reliance on synthetic data" as a weakness.

cubefox 15 minutes ago | parent [-]

Wow...

Nevermark 16 minutes ago | parent | prev | next [-]

With regard to confabulating (hallucinating) sources, or anything else, it is worth noting this is a first class training requirement imposed on models. Not models simply picking up the habit from humans.

When training a student, normally we expect a lack of knowledge early, and reward self-awareness, self-evaluation and self-disclosure of that.

But the very first epoch of a model training run, when the model has all the ignorance of a dropped plate of spaghetti, we optimize the network to respond to information, as anything from a typical human to an expert, without any base of understanding.

So the training practice for models is inherently extreme enforced “fake it until you make it”, to a degree far beyond any human context or culture.

(Regardless, humans need to verify, not to mention read, the sources they site. But it will be nice when models can be trusted to accurately access what they know/don’t-know too.)

neom 42 minutes ago | parent | prev | next [-]

I wrote before about my embarrassing time with ChatGPT during a period (https://news.ycombinator.com/item?id=44767601) - I decided to go back through those old 4o chats with 5.2 pro extended thinking, the reply was pretty funny because it first slightly ridiculed me, heh - but what it showed was: basically I would say "what 5 research papers from any area of science talk to these ideas" and it would find 1 and invent 4 if it didn't know 4 others, and not tell me, and then I'd keep working with it and it would invent what it thought might be in the papers long the way, making up new papers in it's own work to cite to make it's own work valid, lol. Anyway, I'm a moron, sure, and no real harm came of it for me, just still slightly shook I let that happen to me.

smallpipe 2 hours ago | parent | prev | next [-]

Could you run a similar analysis for pre-2020 papers? It'd be interesting to know how prevalent making up sources was before LLMs.

tasuki an hour ago | parent | next [-]

Also, it'd be interesting how many pre-2020 papers their "AI detector" marks as AI-generated. I distrust LLMs somewhat, but I distrust AI detectors even more.

theptip an hour ago | parent | prev [-]

Yeah, it’s kind of meaningless to attribute this to AI without measuring the base rate.

It’s for sure plausible that it’s increasing, but I’m certain this kind of thing happened with humans too.

doug_durham an hour ago | parent | prev | next [-]

Getting papers published is now more about embellishing your CV versus a sincere desire to present new research. I see this everywhere at every level. Getting a paper published anywhere is a checkbox in completing your resume. As an industry we need to stop taking this into consideration when reviewing candidates or deciding pay. In some sense it has become an anti-signal.

biophysboy 5 minutes ago | parent | next [-]

I think its fairer to say that perverse incentives have added more noise to the publishing signal. Publishing 0 times is not better than 100 times, even if 90% of those are Nth author formality/politeness citations.

autoexec 19 minutes ago | parent | prev | next [-]

It'd be nice if there were a dedicated journal for papers published just because you have to publish for your CV or to get your degree. That way people can keep publishing for the sake of publishing, but you could see at a glance what the deal was.

londons_explore 32 minutes ago | parent | prev [-]

I'd like to see a financial approach to deciding pay by giving researchers a small and perhaps nonlinear or time bounded share of any profits that arise from their research.

Then peoples CV's could say "My inventions have led to $1M in licensing revenue" rather than "I presented a useless idea at a decent conference because I managed to make it sound exciting enough to get accepted".

autoexec 15 minutes ago | parent | next [-]

A lot of good research isn't ever going to make anyone a single dime, but that doesn't mean it doesn't matter.

direwolf20 17 minutes ago | parent | prev [-]

That's what patents do.

Molitor5901 2 hours ago | parent | prev | next [-]

AI might just extinguish the entire paradigm of publish or perish. The sheer volume of papers makes it nearly impossible to properly decide which papers have merit, which are non-replicate and suspect, and which are just a desperate rush to publish. The entire practice needs to end.

SJC_Hacker 41 minutes ago | parent | next [-]

Its not publish or perish so much as get grant money or perish.

Publishing is just the way to get grants.

A PI explained it to me once, something like this

Idea(s) -> Grant -> Experiments -> Data -> Paper(s) -> Publication(s) -> Idea(s) -> Grant(s)

Thats the current cycle ... remove any step and its a dead end

shermantanktop 2 hours ago | parent | prev [-]

But how could we possibly evaluate faculty and researcher quality without counting widgets on an assembly line? /s

It’s a problem. The previous regime prior to publishing-mania was essentially a clubby game of reputation amongst peers based on cocktail party socialization.

The publication metrics came out of the harder sciences, I believe, and then spread to the softest of humanities. It was always easy to game a bit if you wanted to try, but now it’s trivial to defeat.

londons_explore 38 minutes ago | parent | prev | next [-]

And this is the tip of the iceberg, because these are the easy to check/validate things.

I'm sure plenty of more nuanced facts are also entirely without basis.

armcat 2 hours ago | parent | prev | next [-]

This is awful but hardly surprising. Someone mentioned reproducible code with the papers - but there is a high likelihood of the code being partially or fully AI generated as well. I.e. AI generated hypothesis -> AI produces code to implement and execute the hypothesis -> AI generates paper based on the hypothesis and the code.

Also: there were 15 000 submissions that were rejected at NeurIPS; it would be very interesting to see what % of those rejected were partially or fully AI generated/hallucinated. Are the ratios comperable?

blackbear_ an hour ago | parent [-]

Whether the code is AI generated or not is not important, what matters is that it really works.

Sharing code enables others to validate the method on a different dataset.

Even before LLMs came around there were lots of methods that looked good on paper but turned out not to work outside of accepted benchmarks

teekert 21 minutes ago | parent | prev | next [-]

We have the h score and such, can we have something similar that goes down when you pull stunts like these? Preferably link it to people’s orcid ids.

CGMthrowaway 2 hours ago | parent | prev | next [-]

Which is worse:

a) p-hacking and suppressing null results

b) hallucinations

c) falsifying data

Would be cool to see an analysis of this

amitav1 7 minutes ago | parent | next [-]

I'm doing some research, and this is something I'm unsure of. I see that "suppressing null results" is a bad thing, and I sort of agree, but for me personally, a lot of the null results are just the result of my own incompetence and don't contain any novel insights.

Proziam 2 hours ago | parent | prev [-]

All 3 of these should be categorized as fraud, and punished criminally.

internetter 2 hours ago | parent [-]

criminally feels excessive?

jacquesm 2 hours ago | parent | next [-]

You could make a good case for a white collar crime here, fraud for instance.

Proziam 2 hours ago | parent | prev [-]

If I steal hundreds of thousands of dollars (salary, plus research grants and other funds) and produce fake output, what do you think is appropriate?

To me, it's no different than stealing a car or tricking an old lady into handing over her fidelity account. You are stealing, and society says stealing is a criminal act.

WarmWash 2 hours ago | parent [-]

We have a civil court system to handle stuff like this already.

wat10000 an hour ago | parent | next [-]

We also have a criminal court system to handle stuff like this.

WarmWash an hour ago | parent [-]

No we don't. I've never seen a private contract dispute go to criminal court, probably because it's a civil matter.

If they actually committed theft, well then that already is illegal too.

But right now, doing "shitty research" isn't illegal and it's unlikely it ever will be.

wat10000 31 minutes ago | parent [-]

The claim is that this would qualify as fraud, which is also illegal.

If you do a search for "contractor imprisoned for fraud" you'll find plenty of cases where a private contract dispute resulted in criminal convictions for people who took money and then didn't do the work.

I don't know if taking money and then merely pretending to do the research would rise to the level of criminal fraud, but it doesn't seem completely outlandish.

Proziam an hour ago | parent | prev [-]

Stealing more than a few thousand dollars is a felony, and felonies are handled in criminal court, not civil.

EDIT - The threshold amount varies. Sometimes it's as low as a few hundred dollars. However, the point stands on its own, because there's no universe where the sum in question is in misdemeanor territory.

WarmWash an hour ago | parent [-]

It would fall under the domain of contract law, because maybe the contract of the grant doesn't prohibit what the researcher did. The way to determine that would be in court - civil court.

Most institutions aren't very chill with grant money being misused, so we already don't need to burden then state with getting Johnny muncipal prosecutor to try and figure out if gamma crystallization imaging sources were incorrect.

leggerss 2 hours ago | parent | prev | next [-]

I don't understand: why aren't there automated tools to verify citations' existence? The data for a citation has a structured styling (APA, MLA, Chicago) and paper metadata is available via e.g. a web search, even if the paper contents are not

I guess GPTZero has such a tool. I'm confused why it isn't used more widely by paper authors and reviewers

gh02t an hour ago | parent | next [-]

Citations are too open ended and prone to variation, and legitimate minor mistskes that wouldn't bother a human verifier but would break automated tools to easily verify in their current form. DOI was supposed to solve some of the literal mechanical variation of the existence of a source, but journal paywalls and limited adoption mean that is not a universal solution. Plus DOI still doesn't easily verify the factual accuracy of a citation, like "does the source say what the citation says it does," which is the most important part.

In my experience you will see considerable variation in citation formats, even in journals that strictly define it and require using BibTex. And lots of journals leave their citation format rules very vague. Its a problem that runs deep.

leggerss 7 minutes ago | parent [-]

Thanks for the thoughtful reply!

eichin an hour ago | parent | prev [-]

Looks like GPTZero Source Finder was only released a year ago - if anything, I'm surprised slop-writers aren't using it preemptively, since they're "ahead of the curve" relative to reviewers on this sort of thing...

dev_l1x_be 19 minutes ago | parent | prev | next [-]

I am wondering if we are going to reach hallucination collapse sooner than we reach AGI.

mt_ an hour ago | parent | prev | next [-]

It would be ironic if the very detection of hallucinations contained hallucinations of its own.

theptip 2 hours ago | parent | prev | next [-]

This is mostly an ad for their product. But I bet you can get pretty good results with a Claude Code agent using a couple simple skills.

Should be extremely easy for AI to successfully detect hallucinated references as they are semi-structured data with an easily verifiable ground truth.

trash_cat 25 minutes ago | parent | prev | next [-]

Clearly there is some demand for those papers, and research, to exist. Good opportunity to fill the gaps.

yobbo an hour ago | parent | prev | next [-]

As long as these sorts of papers serve more important purposes for the careers of the authors than anything related to science or discovery of knowledge, then of course this happens and continues.

The best possible outcome is that these two purposes are disconflated, with follow-on consequences for the conferences and journals.

nospice an hour ago | parent | prev | next [-]

We've been talking about a "crisis of reproducibility" for years and the incentive to crank out high volumes of low-quality research. We now have a tool that brings down the cost of producing plausibly-looking research down to zero. So of course we're going to see that tool abused on a galactic scale.

But here's the thing: let's say you're an university or a research institution that wants to curtail it. You catch someone producing LLM slop, and you confirm it by analyzing their work and conducting internal interviews. You fire them. The fired researcher goes public saying that they were doing nothing of the sort and that this is a witch hunt. Their blog post makes it to the front page of HN, garnering tons of sympathy and prompting many angry calls to their ex-employer. It gets picked up by some mainstream outlets, too. It happened a bunch of times.

In contrast, there are basically no consequences to institutions that let it slide. No one is angrily calling the employers of the authors of these 100 NeurIPS papers, right? If anything, there's the plausible deniability of "oh, I only asked ChatGPT to reformat the citations, the rest of the paper is 100% legit, my bad".

dtartarotti 2 hours ago | parent | prev | next [-]

It is very concerning that these hallucinations passed through peer review. It's not like peer review is a fool-proof method or anything, but the fact that reviewers did not check all references and noticed clearly bogus ones is alarming and could be a sign that the article authors weren't the only ones using LLMs in the process...

amanaplanacanal 2 hours ago | parent [-]

Is it common for peer reviewers to check references? Somehow I thought they mostly focused on whether the experiment looked reasonable and the conclusions followed.

emil-lp 2 hours ago | parent [-]

In journal publications it is, but without DOIs it's difficult.

In conference publications, it's less common.

Conference publications (like NEURips) is treated as announcement of results, not verified.

empiko 2 hours ago | parent [-]

Nobody in ML or AI is verifying all your references. Reviewers will point out if you miss a super related work, but that's it. This is especially true with the recent (last two decades?) inflation in citation counts. You regularly have papers with 50+ references for all kinds of claims and random semirelated work. The citation culture is really uninspiring.

ctoth an hour ago | parent | prev | next [-]

How you know it's really real is that they clearly tell the FPR, and compare against a pre-llm baseline.

But I saw it in Apple News, so MISSION ACCOMPLISHED!

bonsai_spool 2 hours ago | parent | prev | next [-]

This suggests that nobody was screening this papers in the first place—so is it actually significant that people are using LLMs in a setting without meaningful oversight?

These clearly aren't being peer-reviewed, so there's no natural check on LLM usage (which is different than what we see in work published in journals).

emil-lp 2 hours ago | parent | next [-]

As one who reviews 20+ papers per year, we don't have time to verify each reference.

We verify: is the stuff correct, and is it worthy of publication (in the given venue) given that it is correct.

There is still some trust in the authors to not submit made-up-stuff, albeit it is diminishing.

its_ethan 4 minutes ago | parent | next [-]

Sorry, but if someone makes a claim and cites a reference, how do you verify "is the stuff correct" without checking that reference?

paulmist an hour ago | parent | prev [-]

I'm surprised the conference doesn't provide tooling to validate all references automatically.

Sharlin an hour ago | parent [-]

How would you do that? Even in cases where there's a standard format, a DOI on every reference, and some giant online library of publication metadata, including everything that only exists in dead tree format, that just lets you check whether the cited work exists, not whether it's actually a relevant thing to cite in the context.

gcr 2 hours ago | parent | prev | next [-]

Academic venues don't have enough reviewers. This problem isn't new, and as publication volumes increase, it's getting sharply worse.

Consider the unit economics. Suppose NeurIPS gets 20,000 papers in one year. Suppose each author should expect three good reviews, so area chairs assign five reviewers per paper. In total, 100,000 reviews need to be written. It's a lot of work, even before factoring emergency reviewers in.

NeurIPS is one venue alongside CVPR, [IE]CCV, COLM, ICML, EMNLP, and so on. Not all of these conferences are as large as NeurIPS, but the field is smaller than you'd expect. I'd guess there are 300k-1m people in the world who are qualified to review AI papers.

khuey 2 hours ago | parent [-]

Seems like using tooling like this to identify papers with fake citations and auto-rejecting them before they ever get in front of a reviewer would kill two birds with one stone.

gcr 2 hours ago | parent [-]

It's not always possible to distinguish between fake citations and citations that are simply hard to find (e.g. wonderful old books that aren't on the Internet).

Another problem is that conferences move slowly and it's hard to adjust the publication workflow in such an invasive way. CVPR only recently moved from Microsoft's CMT to OpenReview to accept author submissions, for example.

There's a lot of opportunity for innovation in this space, but it's hard when everyone involved would need to agree to switch to a different workflow.

(Not shooting you down. It's just complicated because the people who would benefit are far away from the people who would need to do the work to support it...)

khuey 30 minutes ago | parent [-]

Sure, I agree that it's far from trivial to implement.

alain94040 2 hours ago | parent | prev [-]

When I was reviewing such papers, I didn't bother checking that 30+ citations were correctly indexed. I focused on the article itself, and maybe 1 or 2 citations that are important. That's it. For most citations, they are next to an argument that I know is correct, so why would I bother checking. What else do you expect? My job was to figure out if the article ideas are novel and interesting, not if they got all their citations right.

poulpy123 an hour ago | parent | prev | next [-]

All papers proved to have used a LLM beyond writing improvement should be automatically retracted

captainbland 23 minutes ago | parent | prev | next [-]

What's wild is so many of these are from prestigious universities. MIT, Princeton, Oxford and Cambridge are all on there. It must be a terrible time to be an academic who's getting outcompeted by this slop because somebody from an institution with a better name submitted it.

cflewis 15 minutes ago | parent [-]

I'm going to be charitable and say that the papers from prestigious universities were honest mistakes rather than paper mill university fabrications.

One thing that has bothered me for a very long time is that computer science (and I assume other scientific fields) has long since decided that English is the lingua franca, and if you don't speak it you can't be part of it. Can you imagine if being told that you could only do your research if you were able to write technical papers in a language you didn't speak, maybe even using glyphs you didn't know? It's crazy when you think about it even a little bit, but we ask it of so many. Let's not include the fact that 90% of the English-speaking population couldn't crank out a paper to the required vocabulary level anyway.

A very legitimate, not trying to cheat, use for LLMs is translation. While it would be an extremely broad and dangerous brush to paint with, I wonder if there is a correlation between English-as-a-Second (or even third)-Language authors and the hallucinations. That would indicate that they were trying to use LLMs to help craft the paper to the expected writing level. The only problem being that it sometimes mangles citations, and if you've done good work and got 25+ citations, it's easy for those errors to slip through.

nerdjon an hour ago | parent | prev | next [-]

The downstream effects of this are extremely concerning. We have already seen the damage caused by human written research that was later retracted like the “research” on vaccines causing autism.

As we get more and more papers that may be citing information that was originally hallucinated in the first place we have a major reliability issue here. What is worse is people that did not use AI in the first place will be caught in the crosshairs since they will be referencing incorrect information.

There needs to be a serious amount of education done on what these tools can and cannot do and importantly where they fail. Too many people see these tools as magic since that is what the big companies are pushing them as.

Other than that we need to put in actual repercussions for publishing work created by an LLM without validating it (or just say you can’t in the first place but I guess that ship has sailed) or it will just keep happening. We can’t just ignore it and hope it won’t be a problem.

And yes, humans can make mistakes too. The difference is accountability and the ability to actually be unsure about something so you question yourself to validate.

geremiiah 2 hours ago | parent | prev | next [-]

A lot of research in AI/ML seems to me to be "fake it and never make it". Literally it's all about optics, posturing, connections, publicity. Lots of bullshit and little substance. This was true before AI slop, too. But the fact that AI slop can make it pass the review really showcases how much a paper's acceptance hinges on things, other than the substance and results of the paper.

I even know PIs who got fame and funding based on some research direction that supposedly is going to be revolutionary. Except all they had were preliminary results that from one angle, if you squint, you can envision some good result. But then the result never comes. That's why I say, "fake it, and never make it".

meindnoch an hour ago | parent | prev | next [-]

Jamie, bring up their nationalities.

techIA 33 minutes ago | parent | prev | next [-]

They will turn it into a party drug.

brador an hour ago | parent | prev | next [-]

The problem isn’t scale.

The problem is consequences (lack of).

Doing this should get you barred from research. It won’t.

fulafel 2 hours ago | parent | prev | next [-]

Is there a comparison to rate of reference errors in other forums?

pandemic_region an hour ago | parent | prev | next [-]

What if they would only accept handwritten papers? Basically the current system is beyond repair, so may as well go back to receiving 20 decent papers instead of 20k hallucinated ones.

CrzyLngPwd an hour ago | parent | prev | next [-]

This is not the AI future we dreamed of, or feared.

qwertox 2 hours ago | parent | prev | next [-]

It would be great if those scientists who use AI without disclosing it get fucked for life.

bwfan123 2 hours ago | parent | next [-]

> It would be great if those scientists who use AI without disclosing it get fucked for life.

There need to be dis-incentives for sloppy work. There is a tension between quality and quantity in almost every product. Unfortunately academia has become a numbers-game with paper-mills.

direwolf20 2 hours ago | parent | prev | next [-]

"scientists" FYI. Making shit up isn't science.

pandemic_region an hour ago | parent | prev | next [-]

Instead of publishing their papers in the prestigious zines - which is what they're after - we will publish them in "AI Slop Weekly" with name and picture. Up the submission risk a bit.

oofbey 2 hours ago | parent | prev | next [-]

Harsh sentiment. Pretty soon every knowledge worker will use AI every day. Should people disclose spellcheckers powered by AI? Disclosing is not useful. Being careful in how you use it and checking work is what matters.

geremiiah 2 hours ago | parent | next [-]

What they are doing is plain cheating the system to get their 3 conference papers so they can get their $150k+ job at FAANG. It's plain cheating with no value.

WarmWash 2 hours ago | parent | next [-]

We are only looking at one side of the equation here, in this whole thread.

This feels a bit like the "LED stoplights shouldn't be used because they don't melt snow" argument.

mikkupikku 24 minutes ago | parent [-]

Confront the culprit and ask for their side; you'll just get some sob story about how busy they are and how they were only using the AI to check their grammar and they just don't know how the whole thing ended up fabricated... Waste of time. Just blacklist these people, they're no better than any other scammer.

barbazoo 2 hours ago | parent | prev | next [-]

People that cheat with AI now probably found ways to cheat before as well.

shermantanktop 2 hours ago | parent | prev [-]

Cheating by people in high status positions should get the hammer. But it gets the hand-wringing what-have-we-come-to treatment instead.

ambicapter 2 hours ago | parent | prev | next [-]

> Should people disclose spellcheckers powered by AI?

Thank you for that perfect example of a strawman argument! No, spellcheckers that use AI is not the main concern behind disclosing the use of AI in generating scientific papers, government reports, or any large block of nonfiction text that you paid for that is supposed to make to sense.

fisf 2 hours ago | parent | prev | next [-]

People are accountable for the results they produce using AI. So a scientist is responsible for made up sources in their paper, which is plain fraud.

eichin an hour ago | parent | next [-]

"responsible for made up sources" leads to the hilarious idea that if you cite a paper that doesn't exist, you're now obliged to write that paper (getting it retroactively published might be a challenge though)

oofbey 2 hours ago | parent | prev [-]

I completely agree. But “disclosing the use of AI” doesn’t solve that one bit.

barbazoo 2 hours ago | parent [-]

I don’t disclose what keyboard I use to write my code or if I applied spellcheck afterward. The result is 100% theirs.

Sharlin an hour ago | parent | prev | next [-]

In general we're pretty good at drawing a line between purely editorial stuff like using a spellchecker, or even the services a professional editor (no need to acknowledge), and independent intellectual contribution (must be acknowledged). There's no slippery slope.

duskdozer 2 hours ago | parent | prev | next [-]

>Pretty soon every knowledge worker will use AI every day.

Maybe? There's certainly a push to force the perception of inevitability.

vimda 2 hours ago | parent | prev | next [-]

"Pretty soon every knowledge worker will use AI every day" is a wild statement considering the reporting that most companies deploying AI solutions are seeing little to no benefit, but also, there's a pretty obvious gap between spell checkers and tools that generate large parts of the document for you

Proziam 2 hours ago | parent | prev | next [-]

False equivalence. This isn't about "using AI" it's about having an AI pretend to do your job.

What people are pissed about is the fact their tax dollars fund fake research. It's just fraud, pure and simple. And fraud should be punished brutally, especially in these cases, because the long tail of negative effects produces enormous damage.

freedomben an hour ago | parent [-]

I was originally thinking you were being way too harsh with your "punish criminally" take, but I must admit, you're winning me over. I think we would need to be careful to ensure we never (or realistically, very rarely) convict an innocent person, but this is in many cases outright theft/fraud when someone is making money or being "compensated" for producing work that is fraudulent.

For people who think this is too harsh, just remember we aren't talking about undergrads who cheat on a course paper here. We're talking about people who were given money (often from taxpayers) that committed fraud. This is textbook white collar crime, not some kid being lazy. At a minimum we should be taking all that money back from them and barring them from ever receiving grant money again. In some cases I think fines exceeding the money they received would be appropriate.

PunchyHamster 2 hours ago | parent | prev [-]

nice job moving the goalpost from "hallucinated the research/data" to "spellchecker error"

yesitcan 2 hours ago | parent | prev [-]

One fuck seems appropriate.

yepyeaisntityea an hour ago | parent | prev | next [-]

No surprises. Machine learning has, at least since 2012, been the go-to field for scammers and grifters. Machine learning, and technology in general, is basically a few real ideas, a small number of honest hard workers, and then millions of fad chasers and scammers.

Tom1380 2 hours ago | parent | prev | next [-]

No ETH Zurich, let's go

depressionalt 2 hours ago | parent | prev | next [-]

This is nice and all, but what repercussion does GPTZero get when their bullshit AI detection hallucinates a student using AI? And when that student receives academic discipline because of it?

Many such cases of this. More than 100!

They claim to have custom detection for GPT-5, Gemini, and Claude. They're making that up!

freedomben an hour ago | parent [-]

Indeed. My son has been accused by bullshit AI detection as having used AI, and it has devastated his work quality. After being "disciplined" for using AI (when he didn't), he now intentionally tries to "dumb down" his writing so that it doesn't sound so much like AI. The result is he writes much worse. What a shitty, shitty outcome. I've even found myself leaving typos and things in (even on sites like HN) because if you write too well, inevitably some comment replier will call you out as being an LLM even when you aren't. I'm as annoyed by the LLM posts as everybody else, but the answer surely is not to dumb us down into Idiocracy.

Sharlin an hour ago | parent [-]

It's almost as if this whole LLM stuff wasn't a net benefit to the society after all.

jordanpg 2 hours ago | parent | prev | next [-]

If these are so easy to identify, why not just incorporate some kind of screening into the early stages of peer review?

tossandthrow 2 hours ago | parent | next [-]

What makes you believe that are easy to identify?

emil-lp 2 hours ago | parent [-]

One could require DOIs for each reference. That's both realistic to achieve and easy to verify.

Although then why not just cite existing papers for bogus reasons?

DetectDefect 2 hours ago | parent | prev [-]

Because real work takes time and effort, and there is no real incentive for it here.

TAULIC15 2 hours ago | parent | prev [-]

OHHH IS GOOD