Remix.run Logo
win311fwg 2 hours ago

> Have an LLM write a 50-100KLOC program and prove all correctness properties [...] and tell us what it cost.

Assuming the 50-100KLOC program is of real-world use and not something contrived for the sake of offering something to prove, it is unlikely that proving all correctness properties will be possible, fundamentally. So costs will be nothing — or infinite if you foolishly remain determined to try the impossible.

In the real world we restrict what properties we care about and what models we reason in. Some of those models are woven into the fabric of an LLM. I would think the cost multiplier in those cases is much lower for an LLM as compared to a human that doesn't have an inherit understanding and needs to give it thought. Wouldn't you?

pron 41 minutes ago | parent [-]

> I would think the cost multiplier in those cases is much lower for an LLM as compared to a human that doesn't have an inherit understanding and needs to give it thought. Wouldn't you?

No. I don't see why proving would require less relative effort for an LLM. In fact, years ago, long before LLMs, I wrote about why it is relatively easy to write sort-of-correct software yet hard to write provably correct software, and I don't see why it's any different for LLMs. Their power lies in inductive "intuition", while deduction requires effort, just as it does for humans: https://pron.github.io/posts/people-dont-write-programs

But there's no need to speculate. Those who think verification-by-LLM is feasible and cost-effective on an industrial scale, are welcome to try it and report what they find. So far I've seen only tiny examples, and even they don't show effortless (i.e. token-light) work by the agent.