It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

Organizations struggled with it but they struggle with basically every breaking change. I was on the tooling team that helped an organization handle the transition of about 5 million lines of data science code from python 2.7 to 3.2. We also had to handle other breaking changes like airflow upgrades, spark 2->3, intel->amd->graviton.

At that scale all those changes are a big deal. Heck even the pickle protocol change in Python 3.8 was a big deal for us. I wouldn't characterize the python 2->3 transition as a significantly bigger deal than some of the others. In many ways it was easier because so much hay was made about it there was a lot of knowledge and tooling.

▲

xscott 2 hours ago | parent | next [-]

> It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

They should've just used Python 2's strings as UTF-8. No need to break every existing program, just deprecate and discourage the old Python Unicode type. The new Unicode type (Python 3's string) is a complicated mess, and anyone who thinks it is simple and clean isn't aware of what's going on under the hood.

Having your strings be a simple array of bytes, which might be UTF-8 or WTF-8, seems to be working out pretty well for Go.

	▲	MangoToupe an hour ago \| parent [-]
		I can't say i've ever thought "wow I wish I had to use go's unicode approach". The bytes/str split is the cleanest approach of any runtime I've seen.

▲

JoshTriplett 2 hours ago | parent | prev [-]

With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Imagine if the same interpreter supported both Python 3 and Python 2. Python 3 code could import a Python 2 module, or vice versa. Codebases could migrate somewhat more incrementally. Python 2 code's idea of a "string" would be bytes, and python 3's idea of a "string" would be unicode, but both can speak the other's language, they just have different names for things, so you can migrate.

▲

kstrauser 8 minutes ago | parent | next [-]

That split between bytes and unicode made better code. Bytes are what you get from the network. Is it a PNG? A paragraph of text? Who knows! But in Python 2, you treated them both as the same thing: a series of bytes.

Being more or less forced to decode that series into a string of text where appropriate made a huge number of bugs vanish. Oops, forget to run `value=incoming_data.decode()` before passing incoming data to a function that expects a string, not a series of bytes? Boom! Thing is, it was always broken, but now it's visibly broken. And there was no more having to remember if you'd already .decode()d a value or whether you still needed to, because the end result isn't the same datatype anymore. It was so annoying to have an internal function in a webserver, and the old sloppiness meant that sometimes you were calling it with decoded strings and sometimes the raw bytes coming in over the wire, so sometimes it processed non-ASCII characters incorrectly, and if you tried to fix it by making it decode passed-in values, it start started breaking previously-working callers. Ugh, what a mess!

I hated the schism for about the first month because it broke a lot of my old, crappy code. Well, it didn't actually. It just forced me to be aware of my old, crappy code, and do the hard, non-automatable work of actually fixing it. The end result was far better than what I'd started with.

▲

MangoToupe an hour ago | parent | prev [-]

> With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Not without enormous and unnecessary pain.

	▲	JoshTriplett an hour ago \| parent [-]
		It would absolutely have been harder. But the pain of going that path might potentially have been less than the pain of the Python 2 to Python 3 transition. Or, possibly, it wouldn't have been; I'm not claiming the tradeoff is obvious even in hindsight here.