Something not unlike this happened to me when moving some batch processing code from C++ to Python 1.4 (this was 1997). The batch started finishing about 10x faster. We refused to believe it at first and started looking to make sure the work was actually being done. It was.

The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.

A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.

We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.

▲

DaleBiagio 2 hours ago | parent | next [-]

This is the argument Grace Hopper made in the 1950s when she was pushing for high-level languages.

Her colleagues insisted that compilers could never match hand-written assembly.

She argued that programmer productivity mattered more, that humans working in a language closer to their thinking would write better programs faster, and the net result would be better software. Seventy years later, the same pattern keeps playing out.

▲

apitman 35 minutes ago | parent [-]

I don't think the better software part is playing out

	▲	remexre 5 minutes ago \| parent \| next [-]
		you're thinking of the programs in low-level langs that survived their higher-level-lang competitors; if you plot the programs on your machine by age, how does the low quartile compare on reliability between programs written in each group
	▲	ch4s3 8 minutes ago \| parent \| prev [-]
		There’s a lot of really great software out there right now, and a lot that’s terrible and I think powerful abstractions enable both.

▲

asveikau an hour ago | parent | prev | next [-]

> After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.

Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.

▲

andrewflnr 40 minutes ago | parent | next [-]

Given the context, I'm thinking bad cache keys resulting in spurious cache misses, where the keys are built in some low-level way. Cache misses almost certainly have a bigger asymptotic impact than extra copies, unless that copy constructor is really heavy.

	▲	asveikau 34 minutes ago \| parent [-]
		I'm just remembering a performance issue I heard of eons ago where a sorting function comparison callback inadvertently allocated memory. It made sorting very slow. Someone said in a meeting that sorting was slow, and we all had a laugh about "shouldn't have used the bubble sort!" But it was the key comparison doing something stupid.

▲

NooneAtAll3 an hour ago | parent | prev [-]

good ol' shallow-vs-deep copy

▲

asa400 2 hours ago | parent | prev | next [-]

Fun story! Performance is often highly unintuitive, and even counterintuitive (e.g. going from C++ to Python). Very much an art as well as a science.

Crazy how many stories like this I’ve heard of how doing performance work helped people uncover bugs and/or hidden assumptions about their systems.

▲

envguard 2 hours ago | parent | prev [-]

Agreed — the headline buries the lede. Algorithmic complexity improvements compound across all future inputs regardless of implementation language, while the WASM boundary win is more of a one-time gain. Worth noting that the statement-level caching insight generalises well: many parser-adjacent hot paths suffer the same O(N²) trap when doing repeated prefix/suffix matching without memoisation.