Remix.run Logo
rcxdude 2 days ago

>and now Anthropic is repackaging it a year later, calling it “subliminal learning”

No, distillation and student/teacher is a well known technique (much older than even the original chatGPT), and Anthropic are not claiming to have invented it (it would be laughable to anyone familiar with the field). "subliminal learning" is an observation by Anthropic about something surprising that can happen during the process, which is that, for sufficiently similar models, behaviour can be transferred from student to teacher that is not obviously present in the information transferred between them (i.e. text outputted from the teacher and used to train the student. For example, the student's "favourite animal" changed despite the fact that the teacher was only creating 'random' numbers for the student to try to predict)

pyman 2 days ago | parent [-]

> something surprising that can happen during the process, which is that, for sufficiently similar models, behaviour can be transferred from student to teacher

By "behaviour" they mean data and pattern matching, right? Alan Turing figured that out in the 1940s.

LLMs aren't black boxes doing voodoo, like we like to tell politicians and regulators. They're just software processing massive amounts of data to find patterns and predict what comes next. It looks magical, but it's maths and stats, not magic.

This post is just selling second-hand ideas. And for those of us outside the US who spend all day reading scientific papers, sorry Anthropic, we're not buying it.

ben_w 2 days ago | parent | next [-]

> By "behaviour" they mean data and pattern matching, right? Alan Turing figured that out in the 1940s.

That's like saying Da Vinci figured out heavier-than-air flight. Useful foundation, obviously smart and on the right track, still didn't actually do enough to get all the credit for that.

> It looks magical, but it's maths and stats, not magic.

People keep saying "AI isn't magic, it's just maths" like this is some kind of gotcha.

Turning lead into gold isn't the magic of alchemy, it's just nucleosynthesis.

Taking a living human's heart out without killing them, and replacing it with one you got out a corpse, that isn't the magic of necromancy, neither is it a prayer or ritual to Sekhmet, it's just transplant surgery.

And so on: https://www.lesswrong.com/posts/hAwvJDRKWFibjxh4e/it-isn-t-m...

Even with access to the numbers and mechanisms, the inner workings of LLMs are as clear as mud and still full of surprises. Anthropic's work was, to many people, one such surprise.

pyman 2 days ago | parent [-]

You can't compare software development with surgery, or writing code with transplanting a heart. One is reversible, testable, and fixable. The other involves real lives, real bodies, and no second chances.

ben_w 2 days ago | parent [-]

I can and I have. Neither is "magic".

And plenty of software involves real lives, real bodies, and no second chances, e.g. Therac-25.

Unfortunately for all of us, it does look rather like people are already using clear-as-mud AI models for life-critical processes.

pyman 2 days ago | parent [-]

You can't really compare the two. Yes, machines can (and do) fail, whether it's Therac-25, Tesla Autopilot, or Boeing's MCAS. Any software controlling a physical system carries risk. But unlike surgery, code is testable. You can run it in a sandbox, simulate edge cases, fix bugs, and repeat the process for days, months, or even years until it's stable enough for production. Surgeons don't get that luxury. They can't test a procedure on the same body before performing it. There's one shot, and the consequences are irreversible.

That said, I get your point, LLMs can be unpredictable because of the huge amount of data they're trained on and the quality of that data. You never really know what patterns they'll pick up or how they'll behave in edge cases, especially when the outputs aren't deterministic.

ben_w 2 days ago | parent [-]

> You can't really compare the two.

You think one of them is magic?

If not, you're being needlessly pedantic as well as wrong.

> But unlike surgery, code is testable.

Surgeries are tested. Practice sessions are made. Animal tests for the general idea, cadavers to learn about humans, models for specific patients.

And code is, sadly, often pushed live without testing. Kills people, even.

rcxdude a day ago | parent | prev [-]

I don't think Alan Turing would have predicted the full sentence that I wrote there. The first half is not the interesting or surprising part! And of course it's not magic, but mathematics does in fact contain a lot of things we don't actually understand yet, and system like LLMs are in general something we don't have particularly robust mathematical frameworks for relating their structure to the observed behaviour (compared to other, much simpler, structures).