Remix.run Logo
xvilka a day ago

Code correctness should be checked automatically with the CI and testsuite. New tests should be added. This is exactly what makes sure these stupid errors don't bother the reviewer. Same for the code formatting and documentation.

merely-unlikely a day ago | parent | next [-]

This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.

noitpmeder a day ago | parent | next [-]

I'd go one further and say all published papers should come with a clear list of "claimed truths", and one is only able to cite said paper if they are linking in to an explicit truth.

Then you can build a true hierarchy of citation dependencies, checked 'statically', and have better indications of impact if a fundamental truth is disproven, ...

vkou a day ago | parent [-]

Have you authored a lot of non-CS papers?

Could you provide a proof of concept paper for that sort of thing? Not a toy example, an actual example, derived from messy real-world data, in a non-trivial[1] field?

---

[1] Any field is non-trivial when you get deep enough into it.

alexcdot a day ago | parent | prev [-]

hey, i'm a part of the gptzero team that built automated tooling, to get the results in that article!

totally agree with your thinking here, we can't just give this to an LLM, because of the need to have industry-specific standards for what is a hallucination / match, and how to do the search

thfuran a day ago | parent | prev [-]

What exactly is the analogy you’re suggesting, using LLMs to verify the citations?

tpoacher a day ago | parent [-]

not OP, but that wouldn't really be necessary.

One could submit their bibtex files and expect bibtex citations to be verifiable using a low level checker.

Worst case scenario if your bibtex citation was a variant of one in the checker database you'd be asked to correct it to match the canonical version.

However, as others here have stated, hallucinated "citations" are actually the lesser problem. Citing irrelevant papers based on a fly-by reference is a much harder problem; this was present even before LLMs, but this has now become far worse with LLMs.

thfuran a day ago | parent [-]

Yes, I think verifying mere existence of the cited paper barely moves the needle. I mean, I guess automated verification of that is a cheap rejection criterion, but I don’t think it’s overall very useful.

alexcdot a day ago | parent [-]

really good point. one of the cofounders of gptzero here!

the tool gptzero used in the article also detects if the citation supports the claim too, if you scroll to "cited information accuracy" here: https://app.gptzero.me/documents/1641652a-c598-453f-9c94-e0b...

this is still in beta because its a much harder problem for sure, since its hard to determine if a 40 page paper supports a claims (if the paper claims X is computationally intractable, does that mean algorithms to compute approximate X are slow?)