> These aren't benchmark items with public answer keys — they're claims real users submitted for verification to a fact-checking platform.

Cool.

I wonder if anything of this matters when the authors don't disclose exactly how much of their report was written and made with LLMs in the first place? There even is a "11. Ethics & data use" section, and the research is about LLMs being infallible in some ways, yet the usage of LLMs for the production of this report isn't even mentioned once.

▲

kostaj 3 hours ago | parent [-]

Data collection and processing was done manually. LLMs helped with the report drafting. Everything was human reviewed before publishing.

▲

embedding-shape 2 hours ago | parent | next [-]

So it's not a secret, why you don't add this upfront to the report? The report itself is even about LLMs, makes a lot of sense to disclose your usage of them for writing the report, especially when you're presenting evidence that boils down to LLMs being infallible.

▲

rpdillon 29 minutes ago | parent | next [-]

I think you mean fallible.

It's also a bit weird to "disclose use of LLMs". It rubs me wrong, the same way parents breathlessly talking about "screen time" rubbed me wrong: it's too general, and with such a broad brush, it's going to sweep up a bunch of perfectly fine usage with a bunch of dubious usage. On the flip side, if folks do start disclosing all the time, it's going to turn into a Prop 65 warnings in CA, where everything says it has lead in it, so folks pretty much ignore it and move on.

If the report's conclusions and reasoning lean on LLMs, or if the data processing itself was done with LLMs, that would be interesting, and I wouldn't treat it as some sort of disclosure, but rather discuss it under methodology. Using LLMs to polish the language a bit after writing an initial draft with key findings? Much less interesting.

I realize this is now a religious issue, and some folks are allergic to anything that touched an LLM. I just don't think that perspective is going to end up having a good shelf life.

▲

kostaj 2 hours ago | parent | prev [-]

It's an omission on my side. Will add in the next version.

	▲	embedding-shape 2 hours ago \| parent \| next [-]
		I think you might be able to edit the website to add this, even if you aren't willing to make the report a bit more honest up front. I'm sure you realize that this website/article will now be sent around to a lot of people, many who don't realize exactly how this was written, because they don't read HN comments, they only skim the page contents, and I think most would (incorrectly) assume a report about infallible LLMs to not be written by LLMs, especially when the authors are the same ones who made the report itself.
	▲	aaron695 an hour ago \| parent \| prev [-]
		[dead]

▲

Aurornis an hour ago | parent | prev [-]

> LLMs helped with the report drafting. Everything was human reviewed before publishing.

This is becoming the classic way of admitting an LLM wrote it.

Leaving that out of the report validated the complaint above.