Remix.run Logo
How we made Python's packaging library 3x faster(iscinumpy.dev)
46 points by rbanffy 4 days ago | 6 comments
djoldman 3 hours ago | parent | next [-]

> _canonicalize_table = str.maketrans( "ABCDEFGHIJKLMNOPQRSTUVWXYZ_.", "abcdefghijklmnopqrstuvwxyz--", )

> ...

> value = name.translate(_canonicalize_table)

> while "--" in value:

> value = value.replace("--", "-")

translate can be wildly fast compared to some commonly used regexes or replacements.

est 2 hours ago | parent | next [-]

I am curious, why not .lower().translate('_.', '--')

fwip an hour ago | parent [-]

.lower() has to handle Unicode, right? I imagine the giant tables slow it down a bit.

teaearlgraycold 3 hours ago | parent | prev [-]

I would expect however that a regex replacement would be much faster than your N^2 while loop.

notpushkin 2 hours ago | parent [-]

It would be, if it was a common situation.

This loop handles cases like `eggtools._spam` → `eggtools-spam`, which is probably rare (I guess it’s for packages that export namespaced modules, and you probably don’t want to export _private modules; sorry in advance for non-pythonic terminology). Having more than two separator characters in a row is even more unusual.

zahlman 3 days ago | parent | prev [-]

Previously: https://news.ycombinator.com/item?id=46557542