Remix.run Logo
slopinthebag an hour ago

Thus all software that can be used is correct?

You know what I meant: How can we have confidence that this implementation of RAR is functionally identical to what it's based on? What would give me the confidence to use it in a critical piece of infrastructure?

jaggederest an hour ago | parent | next [-]

Validating compression systems is usually really straightforward. There are 3 layers - decode known values from compressed files (or encode, same), round trip without any alterations, and fuzzing with arbitrary binaries

Because it's a defined format there can be binary exact comparisons between the input and output files - we already have an oracle in the form of proper RAR format software, so if they are identical, you don't need to look further for that specific case.

You can see a version of this that I did quite similarly, for postgresql wire format, here: https://github.com/pgdogdev/pgdog/tree/main/integration/sql

It validates that sql with the same setup, teardown, and test results in perfectly exact compatibility between raw postgresql as the control and various configurations of PgDog, with both the text format and binary format, so ultimately a 6-way multivariate test that should always result in binary-exact results.

slopinthebag an hour ago | parent [-]

Right, that's very different from "using it" and it's also different from "Have an LLM generate code that compiles".

perching_aix 36 minutes ago | parent | prev [-]

> Thus all software that can be used is correct?

You also know what I meant, since I spelled it out in more detail a comment below. But even though you're being facetious, yes, that really is the case. If it works it works. That's the bar for the vast, vast majority of software. Demonstrated practical correctness. If you stumble into a bug, you log it as a defect and fix it. That's all that regular people ever have.

It's literally no different to e.g. validating the NTFS driver that ships in the Linux kernel, or validating any other (re)implementation of anything. You just do a bunch of empirical testing and hope for the best. It is also why reimplementations always lag behind.

Hell, I'm 99% sure this is exactly what the actual vendor does too, or at least I sure hope that they do have tests at least. Cause they're sure as shit not using a formally verified compiler toolchain, meaning they definitely don't have a formal proof about whether even the official implementation in itself is correct. Only empirical data at best too.