Remix.run Logo
voidUpdate 10 hours ago

How many of these cases do we have to have before lawyers realise that they need to check that the things an LLM tells them are actually true?

Latty 10 hours ago | parent | next [-]

It doesn't matter, because any process that seems right most of the time but occasionally is wrong in subtle, hard to spot ways is basically a machine to lull people into not checking, so stuff will always slip through.

It's just like the cars driving themselves but you need to be able to jump in if there is a mistake, humans are not going to react as fast as if they were driving, because they aren't going to be engaged, and no one can stay as engaged as they were when they were doing it themselves.

We need to stop pretending we can tell people they "just" need to check things from LLMs for accuracy, it's a process that inevitably leads to people not checking and things slipping through. Pretending it's the people's fault when essentially everyone using it would eventually end up doing that is stupid and won't solve the core problem.

chii 9 hours ago | parent | next [-]

> won't solve the core problem.

what's the core problem tho? Because if the core problem is "using ai", then it's an inevitable outcome - ai will be used, and there are always incentive to cut costs maximally.

So realistically, the solution is to punish mistakes. We do this for bridges that collapse, for driver mistakes on roads, etc. The "easy" fix is to make punishment harsher for mistakes - whether it's LLM or not, the pedigree of the mistake is irrelevant.

Latty 7 hours ago | parent | next [-]

The core problem is that the tool provides output that looks right and is right a lot of the time, but also slips in incorrect stuff in a hard to notice way.

Punishment isn't a problem because it doesn't work. If you create a system that lulls people into a sense of security, no punishment will stop them because they aren't doing it thinking "it's worth the risk", it's that they don't see the risk. There are so many examples of this, it's weird people still think this actually works.

Furthermore, it becomes a liability-washing tool: companies will tell employees they have to take the time to check things, but then not give them the time required to actually check everything, and then blame employees when they do the only thing they can: let stuff slip.

If you want to use LLMs for this kind of thing, you need to create systems around them that make it hard to make the mistakes. As an example (obviously not a complete solution, just one part): if they cite a source, there should be a mandated automatic check that goes to that source, validates it exists, and that the cited text is actually there, not using LLMs. Exact solutions will vary based on the specific use case.

An example from outside LLMs: we told users they should check the URL bar as a solution to phishing. In theory a user could always make sure they were on the right page and stop attacks. In practice people were always going to slip up. The correct solution was automated tooling that validates the URL (e.g: password managers, passkeys).

chii 7 hours ago | parent [-]

> The correct solution was automated tooling that validates the URL

that's because this particular problem has a solution.

The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either.

And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc).

Latty 6 hours ago | parent [-]

> The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either.

Yeah, it's never going to be possible to validate everything automatically, but you may be able to make the tool valuable enough to justify using it if you can make errors easier to spot. In all cases you need to ask if there is actually any gain from using the LLM and checking it, or if doing so well enough actually takes enough time that it loses it's value. My point is that just blaming the user isn't a good solution.

> And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc).

Yes, regulation needs to be strong because companies can accept these things as a cost of doing business and will do so, but people losing their jobs can be life destroying. If companies are going to not give people the time and tools to check this stuff, then the buck should stop with them not the employees that they are forcing to take risks.

AnimalMuppet 8 hours ago | parent | prev [-]

The human is responsible. That's the fix. I don't care if you got the results from an LLM or from reading cracks in the sidewalk; you are responsible for what you say, and especially for what you say professionally. I mean, that's almost the definition of a professional.

And if you can't play by those rules, then maybe you aren't a professional, even if you happened to sneak your way into a job where professionalism is expected.

Latty 7 hours ago | parent [-]

This doesn't solve the problem, because companies will force people to use these tools and demand they work faster, eventually resulting in people slipping.

People will have to choose between being fired for being "too slow", or taking the risk they end up liable. Most people can't afford to just lose their job, and will end up being pressured into taking the risk, then the companies will liability-wash by giving them the responsibility.

You need regulation that ensures companies can't just push the risk onto employees who can be rotated out to take the blame for mistakes.

chii 7 hours ago | parent [-]

> rotated out to take the blame for mistakes.

companies cannot rotate people out to take the blame - the company would have to suffer a fine for there to be a punishment.

Latty 6 hours ago | parent [-]

Right, but companies routinely accept fines as costs of doing business, while losing your job can destroy your life. If a company has not taken appropriate measures to ensure employees can reasonably catch errors at the rate they are required to work, then the company should take all the blame, because they are choosing to push employees to take risks.

dw_arthur 9 hours ago | parent | prev | next [-]

As someone who has done QA on white collar work it's tiring looking for little errors in work reports. Most people are not cut out for it.

voidUpdate 10 hours ago | parent | prev | next [-]

Probably worth including a "bibliography" section of citations that can be automatically checked that they actually exist then

lazide 10 hours ago | parent [-]

Not enough - you’d also need to check that they say/mean what is being implied. Which is a real problem.

mminer237 6 hours ago | parent [-]

To be fair, that's a problem with human authors too. Wikipedia is really well-cited, but it's common to check a citation and find it only says half of what a sentence does, while the rest seemingly has no basis in fact. Judges are supposed to actually read the citations to not only confirm the case exists and says what's being claimed, but often to also compare & contrast the situations to ensure that principle is applicable to the case at hand.

lazide 5 hours ago | parent [-]

Yup. The issue with LLMs are not that any specific thing it is doing is unique. Rather that it does it in previously unimaginable volume, scale, and accessibility.

macintux 10 hours ago | parent | prev [-]

Even disregarding self driving features, it seems like the smarter we make cars the dumber the drivers are. DRLs are great, until they allow you to drive around all night long with no tail lights and dim front lighting because you’re not paying enough attention to what’s actually turned on.

duskdozer 10 hours ago | parent | prev | next [-]

I'm continually amazed at how much faith people have in them. I guess since they can sound like people and output really authoritative and confident text it just overrides any skepticism subconsciously?

ben_w 10 hours ago | parent | next [-]

Much as I like them, I do frequently remind myself of two things:

1) https://en.wikipedia.org/wiki/Clever_Hans

2) https://archive.org/details/nextgen-issue-26 as an example of how in the 90s we has rapid cycles of a new tech (3d graphics) astounding us with how realistic each new generation was compared to the previous one, and forgetting with each new (game engine) how we'd said the same and felt the same about (graphics) we now regarded as pathetic.

So yes, they do sound "authoritative and confident text it just overrides any skepticism subconsciously", but you shouldn't be amazed, we've always been like this.

pjc50 10 hours ago | parent | prev | next [-]

The advertising campaign is incredible.

PunchyHamster 10 hours ago | parent | prev | next [-]

Yes, just as with politicians. And LLMs have been thoroughly tuned to appear that

direwolf20 9 hours ago | parent | prev | next [-]

https://en.wikipedia.org/wiki/ELIZA_effect

moron4hire 10 hours ago | parent | prev [-]

It's mind boggling how much people claim to like LLMs when you would never design any other piece of software to operate like LLMs do. Designing a system that interact with the user through natural text creates an awful experience. It slows down every interaction as you dig through all the prose to get to the key information. It turns every computer interaction into a school math word problem.

LunaSea 10 hours ago | parent | prev | next [-]

It doesn't matter anymore.

LLMs just revealed what a decadent society we have setup for ourselves worldwide.

coffeefirst 10 hours ago | parent | prev | next [-]

It’s worse than that. We’re hearing about the lawyers and Ars Technica because the consequences are public and the errors are egregious.

It’s likely happening to everyone.

probably_wrong 8 hours ago | parent | prev | next [-]

Just this week I tracked down the citations of a scientific paper (whose authors could very well be here) where 25% of the citations were made up and 50% of the remaining ones were wrong, taking ArXiv papers and citing them as belonging to (say) IJCLR.

It's not just lawyers.

AJ007 9 hours ago | parent | prev | next [-]

This whole thing is silly, LLMs can automate reference validation.

If someone is a lawyer, accountant, doctor, teacher, surgeon, engineer etc, and is regurgitating answers that were pumped out with with GPT-5-extra-low or whatever mediocre throttled model they are using, they should just be fired and de-credentialed. Right now this is easy.

The real problem is ahead: 99.999% of future content that exists will be made using generative AI. For many people using Facebook, Instagram, TikTok, or some other non-sequential, engagement weighted feed, 50%+ of the content they consume today is fake. As that stuff spreads in to modern culture it's going to be an endless battle to keep it out of stuff that should not be publishing fake content (e.g. the New York Times or Wall Street Journal; excluding scientific journals who seem to abandoned validation and basic statistics a long time ago.)

Much of the future value and profit margins might just be in valid data?

raincole 9 hours ago | parent | next [-]

> Right now this is easy.

Easy? In the US you need house impeachment to fire a judge. In some countries judges are completely immune unless they are sentenced for crimes.

mminer237 6 hours ago | parent | next [-]

To fire a federal judge. Local judges, which are the vast majority, can be fired by their colleagues or replaced in elections.

voidUpdate 9 hours ago | parent | prev [-]

Do you need impeachment to fire a lawyer, accountant, doctor, teacher, surgeon or engineer?

raincole 9 hours ago | parent [-]

Nope, and the article is about a judge. What's the point to incentive lawyers to carefully verify their references when they know the judge has no incentive to read them and can just make shit up anyway?

miltonlost 9 hours ago | parent | prev [-]

> This whole thing is silly, LLMs can automate reference validation.

Can they though with 100% accuracy and no hallucinations? Wouldn't you still need to validate that they validated correctly?

zthrowaway 10 hours ago | parent | prev | next [-]

Do we see this a lot in the US? This seems to be more unique to India.

tw04 10 hours ago | parent | next [-]

It’s happening A LOT in the US too. Mainstream media just doesn’t seem to find it that newsworthy.

https://arstechnica.com/tech-policy/2026/02/randomly-quoting...

malshe 8 hours ago | parent | prev | next [-]

From the article:

> In October, two federal judges in the US were called out for the use of AI tools which led to errors in their rulings. In June 2025, the High Court of England and Wales warned lawyers not to use AI-generated case material after a series of cases cited fictitious or partially made up rulings.

duskdozer 10 hours ago | parent | prev [-]

today: https://news.ycombinator.com/item?id=47231189

YeGoblynQueenne 9 hours ago | parent | prev [-]

What kind of AI is this that you constantly need a human to check its job? Do you think Jean-Luc Piccard had to constantly check the output of the Enterprise computer? No he didn't. If AI is not better than humans, then what the heck is the point? You might as well just use humans.