| ▲ | layer8 2 hours ago | |||||||
> Once specs are captured as tests, the LLM can no longer hallucinate. Tests are not a correctness proof. I can’t trust LLMs to correctly reason about their code, and tests are merely a sanity check, they can’t verify that the code was correctly reasoned. | ||||||||
| ▲ | survirtual an hour ago | parent [-] | |||||||
They do not need to be correctness proofs. With appropriate prompting and auditing, the tests allow the LLM see if the code functions as expected and iterates. It also serves as functionality documentation and audit documentation. I also actually do not care if it reasons properly. I care about results that eventually stabilizes on a valid solution. These results do not need to based on "thinking," it can be experimentally derived. Agents can own whatever domain they work in, and acquire results with whatever methods they choose given constraints they are subject to. I measure results by validating via e2e tests, penetration testing, and human testing. I also measure via architecture agents and code review agents that validate adherence to standards. If standards are violated a deeper audit is conducted, if it becomes a pattern, the agent is modified until it stabilizes again. This is more like numerical methods of relaxation. You set the edge conditions / constraints, then iterate the system until it stabilizes on a solution. The solution in this case, however, is meta, because you are stabilizing on a set of agents that can stabilize on a solution. Agents don't "reason" or "think", and I don't need to trust them. I trust only results. | ||||||||
| ||||||||