Remix.run Logo
m3kw9 a day ago

Even that, we don’t know what got updated and what didn’t. Can we assume everything that can be updated is updated?

diggan a day ago | parent | next [-]

> Can we assume everything that can be updated is updated?

What does that even mean? Of course an LLM doesn't know everything, so it we wouldn't be able to assume everything got updated either. At best, if they shared the datasets they used (which they won't, because most likely it was acquired illegally), you could make some guesses what they tried to update.

therein a day ago | parent [-]

> What does that even mean?

I think it is clear what he meant and it is a legitimate question.

If you took a 6 year old and told him about the things that happened in the last year and sent him off to work, did he integrate the last year's knowledge? Did he even believe it or find it true? If that information was conflicting what he knew before, how do we know that the most recent thing he is told he will take as the new information? Will he continue parroting what he knew before this last upload? These are legitimate questions we have about our black box of statistics.

aziaziazi a day ago | parent [-]

Interesting, I read GGP as:

If they stopped learning (=including) at march 31 and something popup on the internet on march 30 (lib update, new Nobel, whatever) there’s many chances it got scrapped because they probably don’t scrap everything in one day (do they ?).

That isn’t mutually exclusive with your answer I guess.

edit: thanks adolph to point out the typo.

adolph a day ago | parent [-]

Maybe I'm old school but isn't the date the last date for inclusion in the training corpus and not the date "they stopped training"?

simlevesque a day ago | parent | prev [-]

You might be able to ask it what it knows.

minimaxir a day ago | parent | next [-]

So something's odd there. I asked it "Who won Super Bowl LIX and what was the winning score?" which was in February and the model replied "I don't have information about Super Bowl LIX (59) because it hasn't been played yet. Super Bowl LIX is scheduled to take place in February 2025.".

ldoughty a day ago | parent [-]

With LLMs, if you repeat something often enough, it becomes true.

I imagine there's a lot more data pointing to the super bowl being upcoming, then the super bowl concluding with the score.

Gonna be scary when bot farms are paid to make massive amounts of politically motivated false content (specifically) targeting future LLMs training

dr-smooth a day ago | parent | next [-]

I'm sure it's already happening.

gosub100 a day ago | parent | prev [-]

A lot of people are forecasting the death of the Internet as we know it. The financial incentives are too high and the barrier of entry is too low. If you can build bots that maybe only generate a fraction of a dollar per day (referring people to businesses, posting spam for elections, poisoning data collection/web crawlers), someone in a poor country will do it. Then, the bots themselves have value which creates a market for specialists in fake profile farming.

I'll go a step further and say this is not a problem but a boon to tech companies. Then they can sell you a "premium service" to a walled garden of only verified humans or bot-filtered content. The rest of the Internet will suck and nobody will have incentive to fix it.

birn559 a day ago | parent [-]

I believe identity providers will become even more important in the future as a consequence and that there will be an arm race (hopefully) ending with most people providing them some kind of official id.

gosub100 a day ago | parent [-]

It might slow them down, but integration of the government into online accounts will have its own set of consequences. Some good, of course. But can chill free speech and become a huge liability for whoever collects and verifies the IDs. One hack (say of the government ID database) would spoil the whole system.

birn559 17 hours ago | parent [-]

I agree, this would have very bad consequences regarding free speech and democracy. Next step after that would be a reestablishing of pseudonymously platforms, going full circle.

krferriter a day ago | parent | prev | next [-]

Why would you trust it to accurately say what it knows? It's all statistical processes. There's no "but actually for this question give me only a correct answer" toggle.

retrofuturism a day ago | parent | prev [-]

When I try Claude Sonnet 4 via web:

https://claude.ai/share/59818e6c-804b-4597-826a-c0ca2eccdc46

>This is a topic that would have developed after my knowledge cutoff of January 2025, so I should search for information [...]