This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"), and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.

So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.

▲

scythmic_waves a day ago | parent | next [-]

> as a code reviewer [you] are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

As a PR reviewer I frequently pull down the code and run it. Especially if I'm suggesting changes because I want to make sure my suggestion is correct.

Do other PR reviewers not do this?

▲

dataflow a day ago | parent | next [-]

I don't commonly do this and I don't know many people who do this frequently either. But it depends strongly on the code, the risks, the gains of doing so, the contributor, the project, the state of testing and how else an error would get caught (I guess this is another way of saying "it depends on the risks"), etc.

E.g. you can imagine that if I'm reviewing changes in authentication logic, I'm obviously going to put a lot more effort into validation than if I'm reviewing a container and wondering if it would be faster as a hashtable instead of a tree.

> because I want to make sure my suggestion is correct.

In this case I would just ask "have you already also tried X" which is much faster than pulling their code, implementing your suggestion, and waiting for a build and test to run.

▲

tpoacher a day ago | parent | prev | next [-]

I do too, but this is a conference, I doubt code was provided.

And even then, what you're describing isn't review per se, it's replication. In principle there are entire journals that one can submit replication reports to, which count as actual peer reviewable publications in themselves. So one needs to be pragmatic with what is expected from a peer review (especially given the imbalance between resources invested to create one versus the lack of resources offered and lack of any meaningful reward)

	▲	Majromax a day ago \| parent [-]
		> I do too, but this is a conference, I doubt code was provided. Machine learning conferences generally encourage (anonymized) submission of code. However, that still doesn't mean that replication is easy. Even if the data is also available, replication of results might require impractical levels of compute power; it's not realistic to ask a peer reviewer to pony up for a cloud account to reproduce even medium-scale results.

▲

lesam a day ago | parent | prev | next [-]

If there’s anything I would want to run to verify, I ask the author to add a unit test. Generally, the existing CI test + new tests in the PR having run successfully is enough. I might pull and run it if I am not sure whether a particular edge case is handled.

Reviewers wanting to pull and run many PRs makes me think your automated tests need improvement.

▲

Terr_ a day ago | parent | prev | next [-]

I don't, but that's because ensuring the PR compiles and passes old+new automated tests is an enforced requirement before it goes out.

So running it myself involves judging other risks, much higher-level ones than bad unicode characters, like the GUI button being in the wrong place.

▲

grayhatter a day ago | parent | prev | next [-]

> Do other PR reviewers not do this?

Some do, many, (like peer reviewers), are unable to consider the consequences of their negligence.

But it's always a welcome reminder that some people care about doing good work. That's easy to forget browsing HN, so I appreciate the reminder :)

▲

vkou a day ago | parent | prev [-]

> Do other PR reviewers not do this?

No, because this is usually a waste of time, because CI enforces that the code and the tests can run at submission time. If your CI isn't doing it, you should put some work in to configure it.

If you regularly have to do this, your codebase should probably have more tests. If you don't trust the author, you should ask them to include test cases for whatever it is that you are concerned about.

▲

grayhatter a day ago | parent | prev | next [-]

> This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"),

No it's not. I think you're trying to make a different point, because you're using an example of a specific deliberate malicious way to hide a token error that prevents compilation, but is visually similar.

> and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

What weird world are you living in where you don't have CI. Also, it's pretty common I'll test code locally when reviewing something more complex, more complex, or more important, if I don't have CI.

> Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.

I don't, because it won't compile. Not because I assume good faith. References and citations are similar to introducing dependencies. We're talking about completely fabricated deps. e.g. This engineer went on npm and grabbed the first package that said left-pad but it's actually a crypto miner. We're not talking about a citation missing a page number, or publication year. We're talking about something that's completely incorrect, being represented as relevant.

> So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.

I would never miss this, because the important thing is code needs to compile. If it doesn't compile, it doesn't reach the master branch. Peer review of a paper doesn't have CI, I'm aware, but it's also not vulnerable to syntax errors like that. A paper with a fake semicolon isn't meaningfully different, so this analogy doesn't map to the fraud I'm commenting on.

▲

tpoacher a day ago | parent [-]

you have completely missed the point of the analogy.

breaking the analogy beyond the point where it is useful by introducing non-generalising specifics is not a useful argument. Otherwise I can counter your more specific non-generalising analogy by introducing little green aliens sabotaging your imaginary CI with the same ease and effect.

▲

grayhatter a day ago | parent [-]

I disagree you could do that and claim to be reasonable.

But I agree, because I'd rather discuss the pragmatics and not bicker over the semantics about an analogy.

Introducing a token error, is different from plagiarism, no? Someone wrote code that can't compile, is different from someone "stealing" proprietary code from some company, and contributing it to some FOSS repo?

In order to assume good faith, you also need to assume the author is the origin. But that's clearly not the case. The origin is from somewhere else, and the author that put their name on the paper didn't verify it, and didn't credit it.

	▲	tpoacher a day ago \| parent [-]
		Sure but the focus here is on the reviewer not the author. The point is what is expected as reasonable review before one can "sign their name on it". "Lazy" (or possibly malicious) authors will always have incentives to cut corners as long as no mechanisms exist to reject (or even penalise) the paper on submission automatically. Which would be the equivalent of a "compiler error" in the code analogy. Effectively the point is, in the absence of such tools, the reviewer can only reasonably be expected to "look over the paper" for high-level issues; catching such low-level issues via manual checks by reviewers has massively diminishing returns for the extra effort involved. So I don't think the conference shaming the reviewers here in the absence of providing such tooling is appropriate.

▲

xvilka a day ago | parent | prev [-]

Code correctness should be checked automatically with the CI and testsuite. New tests should be added. This is exactly what makes sure these stupid errors don't bother the reviewer. Same for the code formatting and documentation.

▲

merely-unlikely a day ago | parent | next [-]

This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.

▲

noitpmeder a day ago | parent | next [-]

I'd go one further and say all published papers should come with a clear list of "claimed truths", and one is only able to cite said paper if they are linking in to an explicit truth.

Then you can build a true hierarchy of citation dependencies, checked 'statically', and have better indications of impact if a fundamental truth is disproven, ...

	▲	vkou a day ago \| parent [-]
		Have you authored a lot of non-CS papers? Could you provide a proof of concept paper for that sort of thing? Not a toy example, an actual example, derived from messy real-world data, in a non-trivial[1] field? --- [1] Any field is non-trivial when you get deep enough into it.

▲

alexcdot a day ago | parent | prev [-]

hey, i'm a part of the gptzero team that built automated tooling, to get the results in that article!

totally agree with your thinking here, we can't just give this to an LLM, because of the need to have industry-specific standards for what is a hallucination / match, and how to do the search

▲

thfuran a day ago | parent | prev [-]

What exactly is the analogy you’re suggesting, using LLMs to verify the citations?

▲

tpoacher a day ago | parent [-]

not OP, but that wouldn't really be necessary.

One could submit their bibtex files and expect bibtex citations to be verifiable using a low level checker.

Worst case scenario if your bibtex citation was a variant of one in the checker database you'd be asked to correct it to match the canonical version.

However, as others here have stated, hallucinated "citations" are actually the lesser problem. Citing irrelevant papers based on a fly-by reference is a much harder problem; this was present even before LLMs, but this has now become far worse with LLMs.

▲

thfuran a day ago | parent [-]

Yes, I think verifying mere existence of the cited paper barely moves the needle. I mean, I guess automated verification of that is a cheap rejection criterion, but I don’t think it’s overall very useful.

	▲	alexcdot a day ago \| parent [-]
		really good point. one of the cofounders of gptzero here! the tool gptzero used in the article also detects if the citation supports the claim too, if you scroll to "cited information accuracy" here: https://app.gptzero.me/documents/1641652a-c598-453f-9c94-e0b... this is still in beta because its a much harder problem for sure, since its hard to determine if a 40 page paper supports a claims (if the paper claims X is computationally intractable, does that mean algorithms to compute approximate X are slow?)