The core problem is that the tool provides output that looks right and is right a lot of the time, but also slips in incorrect stuff in a hard to notice way.

Punishment isn't a problem because it doesn't work. If you create a system that lulls people into a sense of security, no punishment will stop them because they aren't doing it thinking "it's worth the risk", it's that they don't see the risk. There are so many examples of this, it's weird people still think this actually works.

Furthermore, it becomes a liability-washing tool: companies will tell employees they have to take the time to check things, but then not give them the time required to actually check everything, and then blame employees when they do the only thing they can: let stuff slip.

If you want to use LLMs for this kind of thing, you need to create systems around them that make it hard to make the mistakes. As an example (obviously not a complete solution, just one part): if they cite a source, there should be a mandated automatic check that goes to that source, validates it exists, and that the cited text is actually there, not using LLMs. Exact solutions will vary based on the specific use case.

An example from outside LLMs: we told users they should check the URL bar as a solution to phishing. In theory a user could always make sure they were on the right page and stop attacks. In practice people were always going to slip up. The correct solution was automated tooling that validates the URL (e.g: password managers, passkeys).

▲

chii 5 hours ago | parent [-]

> The correct solution was automated tooling that validates the URL

that's because this particular problem has a solution.

The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either.

And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc).

	▲	Latty 4 hours ago \| parent [-]
		> The issue here is that there's no such a tool to automatically validate the output of the LLM - at least, not yet, and i don't see the theoretical way to do it either. Yeah, it's never going to be possible to validate everything automatically, but you may be able to make the tool valuable enough to justify using it if you can make errors easier to spot. In all cases you need to ask if there is actually any gain from using the LLM and checking it, or if doing so well enough actually takes enough time that it loses it's value. My point is that just blaming the user isn't a good solution. > And you're making the punishment as being getting fired from the job - which is true, but the company making the mistake also gets punished (or should be, if regulatory capture hasn't happened...). This results in direct losses for the company and shareholders (in the form of a fine, recalls and/or replacements etc). Yes, regulation needs to be strong because companies can accept these things as a cost of doing business and will do so, but people losing their jobs can be life destroying. If companies are going to not give people the time and tools to check this stuff, then the buck should stop with them not the employees that they are forcing to take risks.