| ▲ | mort96 2 hours ago |
| Okay now this is weird. I can reproduce it just fine ... but only when compressing all PDFs simultaneously. To utilize all cores, I ran: $ for x in *.pdf; do zstd <"$x" >"$x.zst" --ultra -22 & done; wait
(and similar for the other formats).I ran this again and it produced the same 2M file from the source 1.1M file. However when I run without paralellization: $ for x in *.pdf; do zstd <"$x" >"$x.zst" --ultra -22; done
That one file becomes 1.1M, and the total size of *.zst is 37M (competitive with Brotli, which is impressive given how much faster it is to decompress).What's going on here? Surely '-22' disables any adaptive compression stuff based on system resource availability and just uses compression level 22? |
|
| ▲ | terrelln an hour ago | parent | next [-] |
| Yeah, `--adaptive` will enable adaptive compression, but it isn't enabled by default, so shouldn't apply here. But even with `--adaptive`, after compressing each block of 128KB of data, zstd checks that the output size is < 128KB. If it isn't, it emits an uncompressed block that is 128KB + 3B. So it is very central to zstd that it will never emit a block that is larger than 128KB+3B. I will try to reproduce, but I suspect that there is something unrelated to zstd going on. What version of zstd are you using? |
| |
| ▲ | mort96 36 minutes ago | parent [-] | | 'zstd --version' reports: "** Zstandard CLI (64-bit) v1.5.7, by Yann Collet **". This is zstd installed through Homebrew on macOS 26 on an M1 Pro laptop. Also of interest, I was able to reproduce this with a random binary I had in /bin: https://floss.social/@mort/115940378643840495 I was completely unable to reproduce it on my Linux desktop though: https://floss.social/@mort/115940627269799738 | | |
| ▲ | terrelln 8 minutes ago | parent [-] | | I've figured out the issue. Use `wc -c` instead of `du`. I can repro on my Mac with these steps with either `zstd` or `gzip`: $ rm -f ksh.zst
$ zstd < /bin/ksh > ksh.zst
$ du -h ksh.zst
1.2M ksh.zst
$ wc -c ksh.zst
1240701 ksh.zst
$ zstd < /bin/ksh > ksh.zst
$ du -h ksh.zst
2.0M ksh.zst
$ wc -c ksh.zst
1240701 ksh.zst
$ rm -f ksh.gz
$ gzip < /bin/ksh > ksh.gz
$ du -h ksh.gz
1.2M ksh.gz
$ wc -c ksh.gz
1246815 ksh.gz
$ gzip < /bin/ksh > ksh.gz
$ du -h ksh.gz
2.1M ksh.gz
$ wc -c ksh.gz
1246815 ksh.gz
When a file is overwritten, the on-disk size is bigger. I don't know why. But you must have ran zstd's benchmark twice, and every other compressor's benchmark once.I'm a zstd developer, so I have a vested interest in accurate benchmarks, and finding & fixing issues :) |
|
|
|
| ▲ | Zekio 2 hours ago | parent | prev [-] |
| doesn't zstd cap out at compression level 19? |
| |