Remix.run Logo
layer8 3 hours ago

The point is that tests generally only test specific inputs and circumstances. They are a heuristic, but don’t generalize to all possible states and inputs. It’s like probing a mathematical function on some points, where the results being correct on the probed points doesn’t mean the function will yield the desired result on all points of its domain. If the tests are the only measure, they become the target.

The value of a good developer is that they generalize over all possible inputs and states. That’s something current LLMs can’t be trusted to do (yet?).

survirtual 5 minutes ago | parent [-]

Not relevant.

Hallucinations don't matter if the mechanics of the pipeline mitigate them. In other words, at a systems level, you can mitigate hallucinations. The agent level noise is not a concern.

This is no different from CPU design or any other noisy system. Transistors are not perfect and there is always error, so you need error correction. At a transistor level, CPUs are unreliable. At a systems level, they are clean and reliable.

This is no different. The stochastic noisiness of individual agents can be mitigated with redundancy, constraints, and error correction at a systems level.