Remix.run Logo
nickjj 4 days ago

> The startup time of a simple .py script can easily be in the 100 to 300 ms range

I can't say I've ever experienced this. Are you sure it's not related to other things in the script?

I wrote a single file Python script, it's a few thousand lines long. It can process a 10,000 line CSV file and do a lot of calculations to the point where I wrote an entire CLI income / expense tracker with it[0].

The end to end time of the command takes 100ms to process those 10k lines, that's using `time` to measure it. That's on hardware from 2014 using Python 3.13 too. It takes ~550ms to fully process 100k lines as well. I spent zero time optimizing the script but did try to avoid common pitfalls (drastically nested loops, etc.).

[0]: https://github.com/nickjj/plutus

zahlman 4 days ago | parent | next [-]

> I can't say I've ever experienced this. Are you sure it's not related to other things in the script? I wrote a single file Python script, it's a few thousand lines long.

It's because of module imports, primarily and generally. It's worse with many small files than a few large ones (Python 3 adds a little additional overhead because of needing extra system calls and complexity in the import process, to handle `__pycache__` folders. A great way to demonstrate it is to ask pip to do something trivial (like `pip --version`, or `pip install` with no packages specified), or compare the performance of pip installed in a venv to pip used cross-environment (with `--python`). Pip imports literally hundreds of modules at startup, and hundreds more the first time it hits the network.

nickjj 4 days ago | parent | next [-]

Makes sense, most of my scripts are standalone zero dependency scripts that import a few things from the standard library.

`time pip3 --version` takes 230ms on my machine.

maccard 4 days ago | parent [-]

That proves the point, right?

`time pip3 --version` takes ~200ms on my machine. `time go help` takes 25, and prints out 30x more lines than pip3 --version.

nickjj 4 days ago | parent [-]

Yep, running time on my tool's --version takes 50ms and funny enough processing 10k CSV lines with ~2k lines of Python code takes 100ms, so 50ms of that is just Python preparing things to run by importing 20 or so standard library modules.

zahlman 3 days ago | parent [-]

> so 50ms of that is just Python preparing things to run by importing 20 or so standard library modules.

Probably a decent chunk of that actually is the Python runtime starting up. I don't know what all you `import` that isn't implied at startup, though.

Another chunk might be garbage collection at process exit.

fwip 4 days ago | parent | prev [-]

And it's worse if your python libraries might be on network storage - like in a user's homedir in a shared compute environment.

dekhn 4 days ago | parent [-]

Exactly this. The time to start python is roughly a function of timeof(stat) * numberof(stat calls) and on a network system that can often be magnitudes larger than a local filesystem.

zahlman 4 days ago | parent [-]

I do wonder, on a local filesystem, how much of the time is statting paths vs. reading the file contents vs. unmarshaling code objects. (Top-level code also runs when a module is imported, but the cost of that is of course highly module-dependent.)

dekhn 4 days ago | parent [-]

Maybe you could take the stat timings, the read timings (both from strace) and somehow instrument Python to output timing for unmarshalling code (or just instrument everything in python).

Either way, at least on my system with cached file attributes, python can startup in 10ms, so it's not clear whether you truly need to optimize much more than that (by identifying remaining bits to optimize), versus solving the problem another way (not statting 500 files, most of which don't exist, every time you start up).

randomtoast 4 days ago | parent | prev | next [-]

Here is a benchmark https://github.com/bdrung/startup-time

This benchmark is a little bit outdated but the problem remains the same.

Interpreter initialization: Python builds and initializes its entire virtual machine and built-in object structures at startup. Native programs already have their machine code ready and need very little runtime scaffolding.

Dynamic import system: Python’s module import machinery dynamically locates, loads, parses, compiles, and executes modules at runtime. A compiled binary has already linked its dependencies.

Heavy standard library usage: Many Python programs import large parts of the standard library or third-party packages at startup, each of which runs top-level initialization code.

This is especially noticeable if you do not run on an M1 Ultra, but on some slower hardware. From the results on Rasperberry PI 3:

C: 2.19 ms

Go: 4.10 ms

Python3: 197.79 ms

This is about 200ms startup latency for a print("Hello World!") in Python3.

zahlman 4 days ago | parent [-]

Interesting. The tests use Python 3.6, which on my system replicates the huge difference shown in startup time using and not using `-S`. From 3.7 onwards, it makes a much smaller percentage change. There's also a noticeable difference the first time; I guess because of Linux caching various things. (That effect is much bigger with Rust executables, such as uv, in my testing.)

Anyway, your analysis of causes reads like something AI generated and pasted in. It's awkward in the context of the rest of your post, and 2 of the 3 points are clearly irrelevant to a "hello world" benchmark.

maccard 4 days ago | parent | prev | next [-]

A python file with

    import requests
Takes 250ms on my i9 on python 3.13

A go program with

    package main
    import (
       _ "net/http"
    ) 
    func main() {
    }
takes < 10ms.
dotdi 4 days ago | parent [-]

This is not an apples-to-apples comparison. Python needs to load and interpret the whole requests module when you run the above program. The golang linker does dead code elimination, so it probably doesn't run anything and doesn't actually do the import when you launch it.

maccard 4 days ago | parent | next [-]

Sure it's not an apples to apples comparison - python is interpreted and go is statically compiled. But that doesn't change the fact that in practice running a "simple" python program/script can take longer to startup than go can to run your entire program.

dotdi 4 days ago | parent [-]

Still, you are comparing a non-empty program to an empty program.

tuhgdetzhh 4 days ago | parent | next [-]

Even if you actually use the network module in Go, just so that the compiler wouldn't strip it away, you would still have a startup latency in Go way below 25 ms from my experience with writing CLI tools.

Whereas with Python, even in the latest version, you're already looking at atleast 10x the amount of startup latency in practice.

Note: This is excluding the actual time that is made for the network call, which can of course also add quiete some milliseconds, depending on how far on planet earth your destination is.

maccard 4 days ago | parent | prev [-]

You're missing the point. The point is that python is slow to start up _because_ it's not the same.

Compare:

    import requests
    print(requests.get("http://localhost:3000").text)
to

    package main

    import (
      "fmt"
      "io"
      "net/http"
     )

    func main() {
        resp, _ := http.Get("http://localhost:3000")
        defer resp.Body.Close()
        body, _ := io.ReadAll(resp.Body)
        fmt.Println(string(body))
    }
I get:

    python3:  0.08s user 0.02s system 91% cpu 0.113 total
    go 0.00s user 0.01s system 72% cpu 0.015 total
(different hardware as I'm at home).

I wrote another that counts the lines in a file, and tested it against https://www.gutenberg.org/cache/epub/2600/pg2600.txt

I get:

    python 0.03s user 0.01s system 83% cpu 0.059 total
    go 0.00s user 0.00s system 80% cpu 0.010 total
These are toy programs, but IME that these gaps stay as your programs get bigger
dekhn 4 days ago | parent | prev [-]

It's not interpreting- Python is loading the already byte compiled version. But it's also statting several files (various extensions).

I believe in the past people have looked at putting the standard library in a zip file instead of splatted out into a bunch of files in a dirtree. In that case, I think python would just do a few stats, find the zipfile, loaded the whole thing into RAM, and then index into the file.

maccard 4 days ago | parent [-]

> In that case, I think python would just do a few stats, find the zipfile, loaded the whole thing into RAM, and then index into the file.

"If python was implemented totally different it might be fast" - sure, but it's not!

dekhn 4 days ago | parent [-]

No, this feature already exists.

maccard 4 days ago | parent [-]

Great - how do I use it?

kortex 3 days ago | parent [-]

You should look at the self-executing .pex file format (https://docs.pex-tool.org/whatispex.html). The whole python program exists as a single file. You can also unzip the .pex and inspect the dependency tree.

It's tooling agnostic and there are a couple ways to generate them, but the easiest it to just use pants build.

Pants also does dependency traversal (that's the main reason we started using it, deploying a microservices monorepo) so it only packages the necessary modules.

I haven't profiled it yet for cold starts, maybe I'll test that real quick.

https://www.pantsbuild.org/dev/docs/python/overview/pex

Edit: just ran it on a hello world with py3.14 on m3 macbook pro, about 100 +/-30 ms for `python -m hello` and 300-400 (but wild variance) for executing the pex with `./hello/binary.pex`.

I'm not sure if a pants expert could eke out more speed gains and I'm also not sure if this strategy would win out with a lot of dependencies. I'm guessing the time required to stat every imported file pales in comparison to the actual load time, and with pex, everything needs to be unzipped first.

Pex is honestly best when you want to build and distribute an application as a single file (there are flags to bundle the python interpreter too).

The other option is mypyc, though again that seems to mostly speed up runtime https://github.com/mypyc/mypyc

Now if I use `python -S` (disables `import site` on initialization), that gets down to ~15ms execution time for hello world. But that gain gets killed as soon as you start trying to import certain modules (there is a very limited set of modules you can work with and still keep speedup. So if you whole script is pure python with no imports, you could probably have a 20ms cold start).

tlyleung 4 days ago | parent | prev [-]

Just a guess - but perhaps the startup time is before `time` is even imported?

williadc 4 days ago | parent [-]

`time` is a shell command that you can use to invoke other commands and track their runtime.