Remix.run Logo
saidnooneever 3 hours ago

the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.

that being said, speed is important for compression so for systems like webservers etc its an easy sell ofc. very strong point (and smarter implementation in programs) for gzip

nine_k 3 hours ago | parent | next [-]

Bzip2 is great for files that are compressed once, get decompressed many times, and the size is important. A good example is a software release.

pocksuppet 2 hours ago | parent [-]

So is xz, or zstd, and the files are smaller. bzip2 disappeared from software releases when xz was widely available. gzip often remains, as the most compatible option, the FAT32 of compression algorithms.

joecool1029 3 hours ago | parent | prev [-]

> the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.

Long comment to just say: ‘I have no idea about what I’m writing about’

These compression algorithms do not have anything to do with filesystem structure. Anyway the reason you can’t cat together parts of bzip2 but you can with zstd (and gzip) is because zstd does everything in frames and everything in those frames can be decompressed separately (so you can seek and decompress parts). Bzip2 doesn’t do that.

So like, another place bzip2 sucks ass is working with large archives because you need to seek the entire archive before you can decompress it and it makes situations without parity data way more likely to cause dataloss of the whole archive. Really, don’t use it unless you have a super specific use case and know the tradeoffs, for the average person it was great when we would spend the time compressing to save the time sending over dialup.