Remix.run Logo
XYen0n 2 hours ago

The OCI manifest references the hashes of these compressed layers, and re-compressing them does not guarantee obtaining the same hash

mort96 an hour ago | parent | next [-]

If that's the purpose, couldn't you store the hash and throw away the compressed image?

(As others said, compression is deterministic for the same algorithm, parameters and input data)

a_t48 27 minutes ago | parent [-]

Zstd for example only promises determinism on the same version of the library. I've personally seen the hashes mutate between pull and export. Things like tar padding also make a difference. Really, the thing to do is to hash on the _uncompressed_ data and let compression be a transport/registry detail. That's what I've done, at least.

mort96 6 minutes ago | parent [-]

I didn't know that about zstd, that's a bit unfortunate.

Tar isn't related here though, we're talking about compression not archival formats

flakes 2 hours ago | parent | prev [-]

Recompressing should be guaranteed deterministic. It’s the packing/unpacking of tar archives to/from directories on disk that leads to the non-determinism (such as timestamps and ownership metadata). If the tar is left intact, both zstd and gzip should produce byte for byte identical outputs given the same compression parameters.

XYen0n 36 minutes ago | parent [-]

You are correct; I confused archiving with compression. However, even considering only the compression process, same compression parameters cannot be guaranteed, as it is unknown which compression parameters the image publisher used.