Remix.run Logo
johnfn 2 hours ago

Hallucinations are more or less a solved problem for me ever since I made a simple harness to have Codex/Claude check its work by using static typechecking.

emp17344 2 hours ago | parent [-]

But there aren’t very many domains where this type of verification is even possible.

nextaccountic 26 minutes ago | parent [-]

Then you apply LLMs in domains where things can be checked

Indeed I expect to see a huge push into formally verified software just because sound mathematical proofs provide an excellent verifier to put into a LLM hardness. Just see how Aristotle has been successful at math, and it could be applied to coding too

Maybe Lean will become the new Python

https://harmonic.fun/news#blog-post-verina-bench-sota