Remix.run Logo
figassis 6 hours ago

This is where the absolutism of let agents to 100% of the work fails. You get adversarial agents pulling all reverences into a table, they might miss some, so run this a few times.

Then have another set of agents, with skills like web browsing (to verify that links actually exist, maybe that references and abstracts actually match, etc), have one engineer (or agent) write a small script to help with this (just make sure you test it, and a bit).

So your work is not verified until your references table is 90% green checkmarks, maybe with uncertainty figures.

A human can then verify the ones with under 90% certainty.

This alone gets you a long way there. Does not costs the millions they're being paid.

It's quite interesting that these companies marketed themselves as them best of the best in excellence, accept no mistakes. I can imagine the countless keynotes and books about this. Or the sales pitches.

Has always been a lie, they just understood how to hide it. Today they don't, and it's embarrassing.

e12e 6 hours ago | parent | next [-]

> A human can then verify the ones with under 90% certainty.

How about the author actually reads the finished report a couple of times and checks all the references?

It really is the lowest bar - even lower maybe than running a spell check.

palmotea 5 hours ago | parent | next [-]

> How about the author actually reads the finished report a couple of times and checks all the references?

But then you wouldn't be embracing the new agentic ways of working!

danaris 3 hours ago | parent | prev | next [-]

How about the author actually, y'know

authors

the report?

SpicyLemonZest 4 hours ago | parent | prev [-]

The hallucinations here (https://gptzero.me/news/investigations-kpmg/) would have passed a cursory reference check. It's easy to see when it's laid out in a table that "BNP Paribas. AI Integration: Transforming Financial Journeys. The Banking Scene, 2025." is a false citation, because the title doesn't quite match and it wrongly attributes BNP Paribas authorship to an article written about BNP Paribas by some random Belgian guy doing business as "The Banking Scene". It'd be a lot harder to see when you're skimming through browser tab 9 of 45 and see all the key words match up.

e12e 2 hours ago | parent [-]

I'm not talking about a reference check by someone other than the author. You'd not put a reference in in the first place, that you hadn't read, since you couldn't formulate the text that relates to the reference?

Ed: thanks for the link - I hadn't seen that yet.

flowbarai 5 hours ago | parent | prev [-]

[flagged]