UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be practically infinite.”

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...

▲

esrauch 6 hours ago | parent [-]

In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.

So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.

	▲	odyssey7 4 hours ago \| parent [-]
		Yes, there are different levels of sureness being described. When I hear guarantee, it makes me think of correctness proofs. Confidence is more of a practical notion for how much you trust the system for a given use case. Testing can definitely provide confidence in this scenario.