Remix.run Logo
jiggawatts 2 hours ago

There's a difference between some piece of information being "officially published" and the AIs gaining a sufficient understanding of it.

Take any popular technology problem that has been around for a few years such as... wrangling Kubernetes with YAML config files. There's probably hundreds of thousands of discussions, source code samples from GitHub, official docs, blogs, bug reports, pull requests, etc... all discussing the nuances, pitfalls, pros/cons, etc. During pre-training the AIs internalise this and can utilise it later.

Now compare this with anything recent and (relatively) obscure, such as new .NET 10 features which were first officially publishing in November 2025, a month before GPT 5.5 cutoff.

As a human developer, these new language capabilities are on the same "level" for me in my day-to-day work as the features from .NET 9 or .NET 8. Similarly, my IDE has native refactoring and code cleanup support that can take C# code from the previous years and bring it up to the idiomatic style of $currentyear.

The AIs just can't do this, because one single Microsoft release note and one learn.microsoft.com page is nowhere near enough training data! The AI hasn't seen millions of lines of code written with .NET 10, taking advantage of .NET 10 improvements, and hasn't seen thousands of discussions about it. Not yet.

This is a fundamental issue with how LLMs are (currently) trained! Simply moving the cutoff date is not enough.

Human learning is second-order. If I see even the tiniest bit of updated information that invalidates a huge pile of older information, my memory marks everything old as outdated and from that second onwards I use only the new approach.

AI learning is first-order. It has to be given the discussions/blogs/posts that say "Stop using the legacy way, it's terrible! Start using the new hotness"! That, it can learn, but it'll be perpetually behind the rest of us by at least a few years.

Not to mention that thanks to AI forums like StackOverflow are dying, so... where is it going to get this kind of training data from in the future!?

AI training needs to switch to "second order", but AFAIK this is an unsolved problem at this time.