Remix.run Logo
davedx a day ago

Humans, legacy algorithmic systems, and LLM's have different error modes.

- Legacy systems typically have error modes where integrations or user interface breaks in annoying but obvious ways. Pure algorithms calculating things like payroll tend to be (relatively) rigorously developed and are highly deterministic.

- LLMs have error modes more similar to humans than legacy systems, but more limited. They're non-deterministic, make up answers sometimes, and almost never admit they can't do something; sometimes they make pure errors in arithmetic or logic too.

- Humans have even more unpredictable error modes; on top of the errors encountered in LLM's, they also have emotion, fatigue, org politics, demotivation, misaligned incentives, and so on. But because we've been dealing with working with other humans for ten thousand years we've gotten fairly good at managing each other... but it's still challenging.

LLMs probably need a mixture of "correctness tests" (like evals/unit tests) and "management" (human-in-the-loop).