Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.

▲ scorpioxy 13 hours ago | parent | next [-]

The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.

Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.

Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.

I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.

▲

eru 10 hours ago | parent [-]

> I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.

Slight tangent: if Claude can decimate IBM stock price by migrating off Cobol for cheap, surely we can do Python 2 to 3 now, too?

About the internals: we sort of missed an opportunity there, but back then there also didn't quite know what they were doing (or at least we have better ideas of what's useful today). And making the step from 2 to 3 even bigger might have been a bad idea?

▲

scorpioxy 9 hours ago | parent | next [-]

I wasn't aware that migrating projects off Cobol has become cheap and it would only take a Claude subscription.

In my experience, the problem had always been maintaining the business logic and any integrations with third-party software that also may be running legacy code-bases or have been abandoned. It can get quite complicated, from what I've seen. Now of course if you're talking about well maintained code-bases with 100%, or close to 100% test coverage, and that includes the integration part along with having the ability to maintain the user experience and/or user interface then yes it becomes a relatively easy process of "just write the code". But, in my experience, this has never been the case.

For the 2.x code-bases I maintain, the customers simply doesn't want to pay for any of it. They might choose to at a later time, but so far it has been more cost effective for them to pay me to maintain that legacy code than pay to have it migrated. Other customers have different needs and thus budget differently.

I'll refrain from judging if 2 to 3 was a missed opportunity or not. I believe the core team does actually know what they're doing and that any decision would've been criticized.

▲

eru 7 hours ago | parent | next [-]

> I believe the core team does actually know what they're doing and that any decision would've been criticized.

I agree with the latter. About the former: they probably made a good decisions given the information available at the time. I mean that nowadays they know more than they did in the past.

▲

Tempest1981 8 hours ago | parent | prev [-]

IBM shares fell 13% in a single day in last month:

"IBM Sinks Most Since 2000 as Anthropic Touts Cobol Tool"

https://finance.yahoo.com/news/ibm-sinks-most-since-2000-210...

It may not be "cheap", but possibly cheaper than IBM's consulting.

▲

PurpleRamen an hour ago | parent | next [-]

Share-pricing operates on illusions. Just selling a plausible claim can influence the price. Whether they will deliver at the end, doesn't matter at that moment.

	▲	eru an hour ago \| parent [-]
		Feel free to correct the market and make oodles of money.

▲

scorpioxy 8 hours ago | parent | prev | next [-]

I skip news like that. It's an AI business hyping one of their tools in a major AI hype-cycle. Shares can go up and down based on sentiment. My point still stands.

To me, there's a big difference between saying that migration projects can now be assisted with some AI tooling and saying that it is cheap and to just get Claude to do it.

Maybe I am out of touch but the former is realistic and the latter is just magical hand-waving.

▲

Marazan 4 hours ago | parent | prev [-]

IBM share price is back to where it was pre-Anthropic press release.

▲

thaumasiotes 3 hours ago | parent [-]

Sure, but imagine how much higher it would have gone in the counterfactual world where Anthropic didn't have an automatic port-from-Cobol tool.

	▲	Maxion 2 hours ago \| parent [-]
		Remember that those who trade on the stock market are not programmers with decades of experience writing cobol.

▲

CJefferson 6 hours ago | parent | prev [-]

Absoultely, I had a 2 -> 3 code base I'd mostly given up on, and Claude was amazing. It even re-wrote some libraries I used without py3 versions, decided to just write the parts of the libraries I needed.

It does much better with good tests. In my case the output was a statically generated website, so I could just say 'make the same website, given these inputs'.

▲ smcl 10 hours ago | parent | prev | next [-]

I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.

Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026

▲

rtpg 6 hours ago | parent | next [-]

Living through it... Python 3 made a lot of changes for the better but 3.0 in particular included a bunch of unforced errors that made it too hard for people to upgrade in one go.

It wasn't until much later (I would say 3.4 or 3.5?) that we had good tooling to allow for migrating from Python 2 to Python 3 gradually, which is what most tools needed to do.

The final thing that made Python upgrading easy was making a bunch of changes (along with stuff like six) so that you could write code that would run identically in Python 2 and Python 3. That lets you do refactors over time, little cleanups, and not have the huge "move to Python 3" commit.

▲

badsectoracula 8 hours ago | parent | prev | next [-]

> Python is by most measures the most popular language and became so AFTER that switch

The switch had nothing to do with Python's rise in popularity though, it was because of NumPy and later PyTorch being adopted by data scientist and later machine learning tasks that themselves became very popular. Python's popularity rose alongside those.

> There is no reason for anyone to be complaining about it being too hard to upgrade in 2026

The "complaints" are about unnecessary and pointless breakage, that was very difficult for many codebases to upgrade for years. That by now most of these codebases have been either abandoned, upgraded or decided to stick with Python2 until the end of time doesn't mean these pains didn't happen nor that the language's developers inflicting them to their users were a good idea because some largely unrelated external factors made the language popular several years later.

	▲	Izkata 7 hours ago \| parent [-]
		> that was very difficult for many codebases to upgrade for years. In case people have forgotten: python 3.3 through 3.5 (and 3.6 I think) each had to reintroduce something that was removed to make the upgrade easier. Jumping from 2.7 to 3.3 (or higher depending on what you needed) was the recommended route because of this, it was less work than going to 3.0, 3.1, or 3.2

▲

20k 9 hours ago | parent | prev | next [-]

It took a long time for python 3 to add the necessary backwards compatibility features to allow people to switch over. Once they did it was fine, but it was a massive fuck up until then. The migration took far longer than it should have done

Its widely regarded as a disaster for good reason, that forced some corrections in python to fix it. Just because its fine now, does not mean it was always fine

▲

bmitc 6 hours ago | parent | prev [-]

Those are unrelated.

▲ nurettin 7 hours ago | parent | prev | next [-]

The biggest (and worst planned) change was module names. Your imports didn't work, forcing hacks like

    if sys.version_info.major == 2:
        import old
    else:
        import new

Or worse, people used try/except in their imports.

▲ jmspring 9 hours ago | parent | prev | next [-]

still GIL

	▲	marcyb5st an hour ago \| parent [-]
		Opt-in starting from 3.15, or am I mistaken? Anyway you can already try freethreaded builds that have the GIL disabled, but my experience is that most of your dependencies won't work.

▲ gjvc 13 hours ago | parent | prev [-]

yes. it was not a massive shift. it was barely worth the effort.

▲ pansa2 13 hours ago | parent | next [-]

The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.

Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

▲ diziet_sma 11 hours ago | parent | next [-]

> Python essentially bet on UTF-32 (with space-saving optimisations)

How so? Python3 strings are unicode and all the encoding/decoding functions default to utf-8. In practice this means all the python I write is utf-8 compatible unicode and I don't ever have to think about it.

▲ sheept 11 hours ago | parent | next [-]

UTF-32 allows for constant time character accesses, which means that mystr[i] isn't O(n). Most other languages can only provide constant time access for code units.

	▲	msl 4 hours ago \| parent [-]
		UTF-32 allows for constant time access to code points. Neither UTF-8 nor UTF-16 can do the same (there are 2 to the power of 20 valid code points, though not all are in use). While most characters might be encodable as a single code point, Python does not normalize strings, so there is no guarantee that even relatively normal characters are actually stored as single code points. Try this in Python: `s = "a\u0308" print(s) print(s[0])` You will see: `ä a`

▲ pansa2 11 hours ago | parent | prev | next [-]

> all the encoding/decoding functions default to utf-8

Languages that use UTF-8 natively don't need those functions at all. And the ones in Python aren't trivial - see, for example, `surrogateescape`.

As the sibling comment says, the only benefit of all this encoding/decoding is that it allows strings to support constant-time indexing of code points, which isn't something that's commonly needed.

	▲	laurencerowe 10 hours ago \| parent [-]
		They absolutely do because random byte strings are not valid utf8. Safe Rust requires validating bytes when converting to strings because this.

▲ cloudbonsai 8 hours ago | parent | prev [-]

Internally Python holds a string as an array of uint32. A utf-8 representation is created on demand from it (and cached). So pansa2 is basically correct [^1].

IMO, while this may not be optimal, it's far better than the more arcane choice made by other systems. For example, due to reasons only Microsoft can understand, Windows is stuck with UTF-16.

[1] Actually it's more intelligent. For example, Python automatically uses uint8 instead of uint32 for ASCII strings.

▲ zahlman 5 hours ago | parent | next [-]

There is no caching of a "utf-8 representation". You may check for example:

  >>> x = '日本語'*100000000
  >>> import time
  >>> t = time.time(); y = x.encode(); time.time() - t # takes nontrivial time
  >>> t = time.time(); y = x.encode(); time.time() - t # not cached; not any faster

Generally, the only reason this would happen implicitly is for I/O; actual operations on the string operate directly on the internal representation.

Python uses either 8, 16 or 32 bits per character according to the maximum code point found in the string; uint8 is thus used for all strings representable in Latin-1, not just "ASCII". (It does have other optimizations for ASCII strings.)

The reason for Windows being stuck with UTF-16 is quite easy to understand: backwards compatibility. Those APIs were introduced before there supplementary Unicode planes, such that "UTF-16" could be equated with UCS-2; then the surrogate-pair logic was bolted on top of that. Basically the same thing that happened in Java.

	▲	cloudbonsai 2 hours ago \| parent [-]
		> There is no caching of a "utf-8 representation". No there certainly is. This is documented in the official API documentation: `UTF-8 representation is created on demand and cached in the Unicode object. https://docs.python.org/3/c-api/unicode.html#unicode-objects` In particular, Python's Unicode object (PyUnicodeObject) contains a field named utf8. This field is populated when PyUnicode_AsUTF8AndSize() is first called and reused thereafter. You can check the exact code I'm talking about here: https://github.com/python/cpython/blob/main/Objects/unicodeo... Is it clear enough?

▲ nslsm 7 hours ago | parent | prev | next [-]

Read first paragraph here https://devblogs.microsoft.com/oldnewthing/20190830-00/?p=10...

▲ 7 hours ago | parent | prev [-]

[deleted]

▲ zahlman 5 hours ago | parent | prev | next [-]

> Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

It did nothing of the sort. UTF-8 is the default source file encoding and has been the target for many APIs. It likely would have been the default for all I/O stuff if we lived in a world where Windows had functioning Unicode in the terminal the whole time and didn't base all its internal APIs on UTF-16.

I assume you're referring to the internal representation of strings. Describing it as "UTF-32 with space-saving optimizations" is missing the point, and also a contradiction in terms. Yes, it is a system that uses the same number of bytes per character within a given string (and chooses that width according to the string contents). This makes random access possible. Doing anything else would have broken historical expectations about string slicing. There are good arguments that one shouldn't write code like that anyway, but it's hard to identify anything "sub-optimal" about the result except that strings like "I'm learning 日本語" use more memory than they might be able to get away with. (But there are other strings, like "ℍℯℓ℗", that can use a 2-byte width while the UTF-8 encoding would add 3 bytes per character.)

▲ rjh29 12 hours ago | parent | prev [-]

Ironically Perl 5 managed to do the bytes-Unicode split with a feature gate, no giant major version change.

▲ gjvc 7 hours ago | parent | prev [-]

this must be right, i'm getting downvoted

	▲	zahlman 5 hours ago \| parent \| next [-]
		Please don't do this.
	▲	boxed 5 hours ago \| parent \| prev [-]
		It's wrong. Python3 eliminated mountains of annoying bugs that happened all over the code base because of mixing of unicode strings and byte strings. Python2 was an absolute mess.