Remix.run Logo
perching_aix an hour ago

> Thus all software that can be used is correct?

You also know what I meant, since I spelled it out in more detail a comment later. But even though you're being facetious, yes, that really is the case. If it works it works. That's the bar for the vast, vast majority of software, and has been since forever. Demonstrated practical correctness. If you stumble into a bug, you log it as a defect and then either wait for a fix or fix it yourself depending. That's all that regular people ever have. In the case of this project, this was achieved via fuzz testing.

It's literally no different to e.g. validating the NTFS driver that ships in the Linux kernel, or validating any other (re)implementation of anything. You just do a bunch of empirical testing and hope for the best. It is also why reimplementations always lag behind, which I'm not suggesting is not a real concern (or that defects wouldn't be). It's just not a gotcha.

Hell, I'm 99% sure this is exactly what the actual vendor does too, or at least I sure hope that they do have tests at least. Cause they're sure as shit not using a formally verified compiler toolchain, meaning they definitely don't have a formal proof about whether even the official implementation in itself is correct. Only empirical data at best too.

slopinthebag 41 minutes ago | parent [-]

> You just do a bunch of empirical testing and hope for the best.

I get that this is often the case, but it does feel like we should be able to do better. At least when humans write this code you can have the expectation that there was real intent behind making sure the semantics of the code are aligned with the specification. At least with current language models, they tend to just brute-force test suite acceptance until everything passes, in a way no human developer has the capacity for. Of course this is often how it works with humans too (i.e. the classic Oracle story), but it does feel wrong.

Can we be sure that this method has produced a correct artefact without years of extensive usage? Probably not, hence my reluctance to rely on something like this, at least initially.