Remix.run Logo
nrclark 3 hours ago

You only have to decompress it first if it's compressed (commonly using gzip, which is shown with the .gz suffix).

Otherwise, you can randomly access any file in a .tar as long as: - the file is seekable/range-addressible - you scan through it and build the file index first, either at runtime or in advance.

Uncompressed .tar is a reasonable choice for this application because the tools to read/write tar files are very standard, the file format is simple and well-documented, and it incurs no computational overhead.

electroly 2 hours ago | parent | next [-]

You've just constructed your own crappy in-memory zip file, here. If you have to build your own custom index, you're no longer using the standard tools. If you find yourself building indices of tar files, and you control the creation, give yourself a break and use a zip file instead. It has the index built in. Compression is not required when packing files into a zip, if you don't want it.

marginalia_nu 2 hours ago | parent [-]

Yeah it's pretty common to use zip files as purely a container format, with no compression enabled. You can even construct them in such a way it's possible to memory map the contents directly out of the zip file, or read them over network via a small number of range requsts.

kevin_thibedeau 26 minutes ago | parent | prev | next [-]

Romfs is more capable, simple to support, and doesn't have the overhead of tar's large headers and typical large blocking factors.

johannes1234321 2 hours ago | parent | prev [-]

> Uncompressed .tar is a reasonable choice for this application

Yes, uncompressed tar (with transfer compression, which is offered in HTTP) is an option for some amount of data.

Till the point where it isn't. zip has similar benefits as tar(+transfer compression) but a later point where it fails for such a scenario.

chungy an hour ago | parent [-]

Zip allows you to set compression algorithm on a per-file basis, including no compression.

QuantumNomad_ 44 minutes ago | parent [-]

You can achieve the same with tar if you individually compress the files before adding them to the tar ball instead of compressing the tar ball itself.

I don’t see how that plus a small index of offsets would be notably more or less work to do from using a zip file.

chungy 11 minutes ago | parent [-]

Zip has a central directory you could just query, instead of having to construct one in-memory by scanning the entire archive. That's significantly less work.