Okay now this is weird.

I can reproduce it just fine ... but only when compressing all PDFs simultaneously.

To utilize all cores, I ran:

    $ for x in *.pdf; do zstd <"$x" >"$x.zst" --ultra -22 & done; wait

(and similar for the other formats).

I ran this again and it produced the same 2M file from the source 1.1M file. However when I run without paralellization:

    $ for x in *.pdf; do zstd <"$x" >"$x.zst" --ultra -22; done

That one file becomes 1.1M, and the total size of *.zst is 37M (competitive with Brotli, which is impressive given how much faster it is to decompress).

What's going on here? Surely '-22' disables any adaptive compression stuff based on system resource availability and just uses compression level 22?

▲ terrelln an hour ago | parent | next [-]

Yeah, `--adaptive` will enable adaptive compression, but it isn't enabled by default, so shouldn't apply here. But even with `--adaptive`, after compressing each block of 128KB of data, zstd checks that the output size is < 128KB. If it isn't, it emits an uncompressed block that is 128KB + 3B.

So it is very central to zstd that it will never emit a block that is larger than 128KB+3B.

I will try to reproduce, but I suspect that there is something unrelated to zstd going on.

What version of zstd are you using?

▲ mort96 36 minutes ago | parent [-]

'zstd --version' reports: "** Zstandard CLI (64-bit) v1.5.7, by Yann Collet **". This is zstd installed through Homebrew on macOS 26 on an M1 Pro laptop. Also of interest, I was able to reproduce this with a random binary I had in /bin: https://floss.social/@mort/115940378643840495

I was completely unable to reproduce it on my Linux desktop though: https://floss.social/@mort/115940627269799738

	▲	terrelln 8 minutes ago \| parent [-]
		I've figured out the issue. Use `wc -c` instead of `du`. I can repro on my Mac with these steps with either `zstd` or `gzip`: `$ rm -f ksh.zst $ zstd < /bin/ksh > ksh.zst $ du -h ksh.zst 1.2M ksh.zst $ wc -c ksh.zst 1240701 ksh.zst $ zstd < /bin/ksh > ksh.zst $ du -h ksh.zst 2.0M ksh.zst $ wc -c ksh.zst 1240701 ksh.zst $ rm -f ksh.gz $ gzip < /bin/ksh > ksh.gz $ du -h ksh.gz 1.2M ksh.gz $ wc -c ksh.gz 1246815 ksh.gz $ gzip < /bin/ksh > ksh.gz $ du -h ksh.gz 2.1M ksh.gz $ wc -c ksh.gz 1246815 ksh.gz` When a file is overwritten, the on-disk size is bigger. I don't know why. But you must have ran zstd's benchmark twice, and every other compressor's benchmark once. I'm a zstd developer, so I have a vested interest in accurate benchmarks, and finding & fixing issues :)

▲ Zekio 2 hours ago | parent | prev [-]

doesn't zstd cap out at compression level 19?

	▲	mort96 2 hours ago \| parent [-]
		From the man page: `--ultra: unlocks high compression levels 20+ (maximum 22), using a lot more memory.` Regardless, this reproduces with random other files and with '-9' as the compression level. I made a mastodon post about it here: https://floss.social/@mort/115940378643840495