▲ | mrcarrot 4 days ago | |||||||
The "Optimized Tarball Extraction" confuses me a bit. It begins by illustrating how other package managers have to repeatedly copy the received, compressed data into larger and larger buffers (not mentioning anything about the buffer where the decompressed data goes), and then says that: > Bun takes a different approach by buffering the entire tarball before decompressing. But seems to sidestep _how_ it does this any differently than the "bad" snippet the section opened with (presumably it checks the Content-Length header when it's fetching the tarball or something, and can assume the size it gets from there is correct). All it says about this is: > Once Bun has the complete tarball in memory it can read the last 4 bytes of the gzip format. Then it explains how it can pre-allocate a buffer for the decompressed data, but we never saw how this buffer allocation happens in the "bad" example! > These bytes are special since store the uncompressed size of the file! Instead of having to guess how large the uncompressed file will be, Bun can pre-allocate memory to eliminate buffer resizing entirely Presumably the saving is in the slow package managers having to expand _both_ of the buffers involved, while bun preallocates at least one of them? | ||||||||
▲ | Jarred 3 days ago | parent [-] | |||||||
Here is the code: https://github.com/oven-sh/bun/blob/7d5f5ad7728b4ede521906a4... We trust the self-reported size by gzip up to 64 MB, try to allocate enough space for all the output, then run it through libdeflate. This is instead of a loop that decompresses it chunk-by-chunk and then extracts it chunk-by-chunk and resizing a big tarball many times over. | ||||||||
|