Remix.run Logo
jordanb 3 hours ago

It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

Organizations struggled with it but they struggle with basically every breaking change. I was on the tooling team that helped an organization handle the transition of about 5 million lines of data science code from python 2.7 to 3.2. We also had to handle other breaking changes like airflow upgrades, spark 2->3, intel->amd->graviton.

At that scale all those changes are a big deal. Heck even the pickle protocol change in Python 3.8 was a big deal for us. I wouldn't characterize the python 2->3 transition as a significantly bigger deal than some of the others. In many ways it was easier because so much hay was made about it there was a lot of knowledge and tooling.

xscott 2 hours ago | parent | next [-]

> It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

They should've just used Python 2's strings as UTF-8. No need to break every existing program, just deprecate and discourage the old Python Unicode type. The new Unicode type (Python 3's string) is a complicated mess, and anyone who thinks it is simple and clean isn't aware of what's going on under the hood.

Having your strings be a simple array of bytes, which might be UTF-8 or WTF-8, seems to be working out pretty well for Go.

MangoToupe an hour ago | parent [-]

I can't say i've ever thought "wow I wish I had to use go's unicode approach". The bytes/str split is the cleanest approach of any runtime I've seen.

JoshTriplett 2 hours ago | parent | prev [-]

With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Imagine if the same interpreter supported both Python 3 and Python 2. Python 3 code could import a Python 2 module, or vice versa. Codebases could migrate somewhat more incrementally. Python 2 code's idea of a "string" would be bytes, and python 3's idea of a "string" would be unicode, but both can speak the other's language, they just have different names for things, so you can migrate.

kstrauser 8 minutes ago | parent | next [-]

That split between bytes and unicode made better code. Bytes are what you get from the network. Is it a PNG? A paragraph of text? Who knows! But in Python 2, you treated them both as the same thing: a series of bytes.

Being more or less forced to decode that series into a string of text where appropriate made a huge number of bugs vanish. Oops, forget to run `value=incoming_data.decode()` before passing incoming data to a function that expects a string, not a series of bytes? Boom! Thing is, it was always broken, but now it's visibly broken. And there was no more having to remember if you'd already .decode()d a value or whether you still needed to, because the end result isn't the same datatype anymore. It was so annoying to have an internal function in a webserver, and the old sloppiness meant that sometimes you were calling it with decoded strings and sometimes the raw bytes coming in over the wire, so sometimes it processed non-ASCII characters incorrectly, and if you tried to fix it by making it decode passed-in values, it start started breaking previously-working callers. Ugh, what a mess!

I hated the schism for about the first month because it broke a lot of my old, crappy code. Well, it didn't actually. It just forced me to be aware of my old, crappy code, and do the hard, non-automatable work of actually fixing it. The end result was far better than what I'd started with.

MangoToupe an hour ago | parent | prev [-]

> With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Not without enormous and unnecessary pain.

JoshTriplett an hour ago | parent [-]

It would absolutely have been harder. But the pain of going that path might potentially have been less than the pain of the Python 2 to Python 3 transition. Or, possibly, it wouldn't have been; I'm not claiming the tradeoff is obvious even in hindsight here.