Remix.run Logo
Patterns for Faster Python Code(blog.jetbrains.com)
5 points by birdculture 10 hours ago | 1 comments
zahlman 5 hours ago | parent [-]

> This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.

This is definitely not the sort of thing that takes 17 years of experience to write up.

There isn't a big distinction drawn here between big-O savings and micro-optimizations; the former are mostly CS fundamentals (especially the set lookup thing in point 1) and you're left to infer (or know) what's what in that regard. There's also zero distinction between things that have any specific connection to Python (or even more specifically to the CPython implementation) and things that every programmer should know (and often just doesn't think about; cf. https://danluu.com/algorithms-interviews/).

The timing is naive, and supposed benefits aren't even all reproducible. In particular, the pre-allocation strategy (point 5) only makes sense if you can reuse pre-allocated storage (which for a lot of algorithms in Python is going to involve tracking the number of used elements manually since it won't be fixed). On my machine with recent Python I consistently get the opposite result for the demos; the dynamic allocation is slightly faster. (But of course, this is a silly toy example, where you get even better performance by doing `list(range(1000000))` — which is how it's done in point 4!) Similarly, the performance difference with `itertools.product` is less dramatic with a proper timing technique, and becomes much less dramatic using a list comprehension to assemble the list rather than repeated appending.

The analysis is largely incomplete. The `__slots__` example is presented as a memory optimization (which it is) but then benchmarked for speed. And it's also not compared to analogous use of `namedtuple`, `Dataclass` etc.

The last point is almost misleading; one expects a discussion of function call overhead and the trade-off of inlining, but actually it's about looking for repeated calculations of something that could be cached. Which... applies a lot more broadly than presented.

And of course, everything is written in a super-padded, self-important LLMish style (sentences like "This technique is particularly valuable in numerical computations, simulations, and large-scale data processing, where even small optimizations can add up." are practically information-free). Which, of course, takes pains to shill for the IDE made by the publishers. (Did you know that our IDE helps you auto-complete references to standard library module contents? Never mind that if you care about optimization on the level of choosing `math.sqrt` over the `*` operator for performance reasons, and for some reason you can't choose a different language, you're probably also going to care about the name lookup).

Oh, and the examples in point 6 aren't even equivalent! They compute different results and the slower exception-handling one also invokes floating-point math. These issues turn out not to affect the execution time much, but it still looks quite sloppy. (Not to mention, it's unusual that real-world code would end up raising exceptions this frequently in normal use, and when it does it won't be that obvious.)