Posted by Bogdanp 2 days ago
What's the reason for this?
I could imagine, many tools could profit from knowing the decompressed file size in advance.
> ISIZE (Input SIZE)
> This contains the size of the original (uncompressed) input data modulo 2^32.
So there's two big caveats:
1. Your data is a single GIZP member (I guess this means everything in a folder)
2. Your data is < 2^32 bytes.
However, because of the scale of what bun deals with it's on the edge of what I would consider safe and I hope in the real code there's a fallback for what happens if the file has multiple members in it, because sooner or later it'll happen.
It's not necessarily terribly well known that you can just slam gzip members (or files) together and it's still a legal gzip stream, but it's something I've made use of in real code, so I know it's happened. You can do some simple things with having indices into a compressed file so you can skip over portions of the compressed stream safely, without other programs having to "know" that's a feature of the file format.
Although the whole thing is weird in general because you can stream gzip'd tars without every having to allocate space for the whole thing anyhow. gzip can be streamed without having seen the footer yet and the tar format can be streamed out pretty easily. I've written code for this in Go a couple of times, where I can be quite sure there's no stream rewinding occuring by the nature of the io.Reader system. Reading the whole file into memory to unpack it was never necessary in the first place, not sure if they've got some other reason to do that.
I was just wondering why GZIP specified it that way.
Thanks!
---
def _read_eof(self):
# We've read to the end of the file, so we have to rewind in order
# to reread the 8 bytes containing the CRC and the file size.
# We check the that the computed CRC and size of the
# uncompressed data matches the stored values. Note that the size
# stored is the true file size mod 2*32.
---
~/: bun install
error: An unknown error occurred (Unexpected)
...
> On a 3GHz processor, 1000-1500 cycles is about 500 nanoseconds. This might sound negligibly fast, but modern SSDs can handle over 1 million operations per second. If each operation requires a system call, you're burning 1.5 billion cycles per second just on mode switching.
> Package installation makes thousands of these system calls. Installing React and its dependencies might trigger 50,000+ system calls: that's seconds of CPU time lost to mode switching alone! Not even reading files or installing packages, just switching between user and kernel mode.
Am I missing something or is this incorrect. They claim 500ns per syscall with 50k syscalls. 500ns * 50000 = 25 milliseconds. So that is very far from "seconds of CPU time lost to mode switching alone!" right?
Still only about 2 secs, but still.