Remix.run Logo
Latty 10 hours ago

It doesn't matter, because any process that seems right most of the time but occasionally is wrong in subtle, hard to spot ways is basically a machine to lull people into not checking, so stuff will always slip through.

It's just like the cars driving themselves but you need to be able to jump in if there is a mistake, humans are not going to react as fast as if they were driving, because they aren't going to be engaged, and no one can stay as engaged as they were when they were doing it themselves.

We need to stop pretending we can tell people they "just" need to check things from LLMs for accuracy, it's a process that inevitably leads to people not checking and things slipping through. Pretending it's the people's fault when essentially everyone using it would eventually end up doing that is stupid and won't solve the core problem.

chii 9 hours ago | parent | next [-]

> won't solve the core problem.

what's the core problem tho? Because if the core problem is "using ai", then it's an inevitable outcome - ai will be used, and there are always incentive to cut costs maximally.

So realistically, the solution is to punish mistakes. We do this for bridges that collapse, for driver mistakes on roads, etc. The "easy" fix is to make punishment harsher for mistakes - whether it's LLM or not, the pedigree of the mistake is irrelevant.

Latty 7 hours ago | parent | next [-]

The core problem is that the tool provides output that looks right and is right a lot of the time, but also slips in incorrect stuff in a hard to notice way.

Punishment isn't a problem because it doesn't work. If you create a system that lulls people into a sense of security, no punishment will stop them because they aren't doing it thinking "it's worth the risk", it's that they don't see the risk. There are so many examples of this, it's weird people still think this actually works.

Furthermore, it becomes a liability-washing tool: companies will tell employees they have to take the time to check things, but then not give them the time required to actually check everything, and then blame employees when they do the only thing they can: let stuff slip.

If you want to use LLMs for this kind of thing, you need to create systems around them that make it hard to make the mistakes. As an example (obviously not a complete solution, just one part): if they cite a source, there should be a mandated automatic check that goes to that source, validates it exists, and that the cited text is actually there, not using LLMs. Exact solutions will vary based on the specific use case.

An example from outside LLMs: we told users they should check the URL bar as a solution to phishing. In theory a user could always make sure they were on the right page and stop attacks. In practice people were always going to slip up. The correct solution was automated tooling that validates the URL (e.g: password managers, passkeys).

chii 7 hours ago | parent [-]

> The correct solution was automated tooling that validates the URL

that's because this particular problem has a solution.

The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either.

And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc).

Latty 6 hours ago | parent [-]

> The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either.

Yeah, it's never going to be possible to validate everything automatically, but you may be able to make the tool valuable enough to justify using it if you can make errors easier to spot. In all cases you need to ask if there is actually any gain from using the LLM and checking it, or if doing so well enough actually takes enough time that it loses it's value. My point is that just blaming the user isn't a good solution.

> And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc).

Yes, regulation needs to be strong because companies can accept these things as a cost of doing business and will do so, but people losing their jobs can be life destroying. If companies are going to not give people the time and tools to check this stuff, then the buck should stop with them not the employees that they are forcing to take risks.

AnimalMuppet 8 hours ago | parent | prev [-]

The human is responsible. That's the fix. I don't care if you got the results from an LLM or from reading cracks in the sidewalk; you are responsible for what you say, and especially for what you say professionally. I mean, that's almost the definition of a professional.

And if you can't play by those rules, then maybe you aren't a professional, even if you happened to sneak your way into a job where professionalism is expected.

Latty 7 hours ago | parent [-]

This doesn't solve the problem, because companies will force people to use these tools and demand they work faster, eventually resulting in people slipping.

People will have to choose between being fired for being "too slow", or taking the risk they end up liable. Most people can't afford to just lose their job, and will end up being pressured into taking the risk, then the companies will liability-wash by giving them the responsibility.

You need regulation that ensures companies can't just push the risk onto employees who can be rotated out to take the blame for mistakes.

chii 7 hours ago | parent [-]

> rotated out to take the blame for mistakes.

companies cannot rotate people out to take the blame - the company would have to suffer a fine for there to be a punishment.

Latty 6 hours ago | parent [-]

Right, but companies routinely accept fines as costs of doing business, while losing your job can destroy your life. If a company has not taken appropriate measures to ensure employees can reasonably catch errors at the rate they are required to work, then the company should take all the blame, because they are choosing to push employees to take risks.

dw_arthur 9 hours ago | parent | prev | next [-]

As someone who has done QA on white collar work it's tiring looking for little errors in work reports. Most people are not cut out for it.

voidUpdate 10 hours ago | parent | prev | next [-]

Probably worth including a "bibliography" section of citations that can be automatically checked that they actually exist then

lazide 10 hours ago | parent [-]

Not enough - you’d also need to check that they say/mean what is being implied. Which is a real problem.

mminer237 6 hours ago | parent [-]

To be fair, that's a problem with human authors too. Wikipedia is really well-cited, but it's common to check a citation and find it only says half of what a sentence does, while the rest seemingly has no basis in fact. Judges are supposed to actually read the citations to not only confirm the case exists and says what's being claimed, but often to also compare & contrast the situations to ensure that principle is applicable to the case at hand.

lazide 5 hours ago | parent [-]

Yup. The issue with LLMs are not that any specific thing it is doing is unique. Rather that it does it in previously unimaginable volume, scale, and accessibility.

macintux 10 hours ago | parent | prev [-]

Even disregarding self driving features, it seems like the smarter we make cars the dumber the drivers are. DRLs are great, until they allow you to drive around all night long with no tail lights and dim front lighting because you’re not paying enough attention to what’s actually turned on.