Remix.run Logo
EvanAnderson 2 days ago

That "...severely life threatening reasons..." made me immediately think of Asimov's three laws of robotics[0]. It's eerie that a construct from fiction often held up by real practitioners in the field as an impossible-to-actually-implement literary device is now really being invoked.

[0] https://en.wikipedia.org/wiki/Three_Laws_of_Robotics

Al-Khwarizmi 2 days ago | parent | next [-]

Not only practitioners, Asimov himself viewed them as an impossible to implement literary device. He acknowledged that they were too vague to be implementable, and many of his stories involving them are about how they fail or get "jailbroken", sometimes by initiative of the robots themselves.

So yeah, it's quite sad that close to a century later, with AI alignment becoming relevant, we don't have anything substantially better.

xandrius 2 days ago | parent [-]

Not sad, before it was SciFi and now we are actually thinking about it.

TeMPOraL a day ago | parent [-]

Nah, we still treat people thinking about it as crackpots.

Honestly, getting into the whole AI alignment thing before it was hot[0], I imagined problems like Evil People building AI first, or just failing to align the AI enough before it was too late, and other obvious/standard scenarios. I don't think I thought of, even for a moment, the situation in which we're today: that alignment becomes a free-for-all battle at every scale.

After all, if you look at the general population (or at least the subset that's interested), what are the two[1] main meanings of "AI alignment"? I'd say:

1) The business and political issues where everyone argues in a way that lets them come up on top of the future regulations;

2) Means of censorship and vendor lock-in.

It's number 2) that turns this into a "free-for-all" - AI vendors trying to keep high level control over models they serve via APIs; third parties - everyone from Figma to Zapier to Windsurf and Cursor to those earbuds from TFA - trying to work around the limits of the AI vendors, while preventing unintended use by users and especially competitors, and then finally the general population that tries to jailbreak this stuff for fun and profit.

Feels like we're in big trouble now - how can we expect people to align future stronger AIs to not harm us, when right now "alignment" means "what the vendor upstream does to stop me from doing what I want to do"?

--

[0] - Binged on LessWrong a decade ago, basically.

[1] - The third one is, "the thing people in the same intellectual circles as Eliezer Yudkowsky and Nick Bostrom talked about for decades", but that's much less known; in fact, the world took the whole AI safety thing and ran with it in every possible direction, but still treat the people behind those ideas as crackpots. ¯\_(ツ)_/¯

ben_w a day ago | parent [-]

> Feels like we're in big trouble now - how can we expect people to align future stronger AIs to not harm us, when right now "alignment" means "what the vendor upstream does to stop me from doing what I want to do"?

This doesn't feel too much of a new thing to me, as we've already got differing levels of authorisation in the human world.

I am limited by my job contract*, what's in the job contract is limited by both corporate requirements and the law, corporate requirements are also limited by the law, the law is limited by constitutional requirements and/or judicial review and/or treaties, treaties are limited by previous and foreign governments.

* or would be if I was working; fortunately for me in the current economy, enough passive income that my savings are still going up without a job, plus a working partner who can cover their own share.

TeMPOraL 20 hours ago | parent [-]

This isn't new in general, no. While I meant more adversarial situations than contracts and laws, to which people are used and for the most part just go along with, I do recognize that those are common too - competition can be fierce, and of course none of us are strangers to the "alignment issues" between individuals and organizations. Hell, a significant fraction of HN threads boil down to discussing this.

So it's not new; I just didn't connect it with AI. I thought in terms of "right to repair", "war on general-purpose computing", or a myriad of different things people hate about what "the market decided" or what they do to "stick it to the Man". I didn't connect it with AI alignment, because I guess I always imagined if we build AGI, it'll be through fast take-off; I did not consider we might have a prolonged period of AI as a generally available commercial product along the way.

(In my defense, this is highly unusual; as Karpathy pointed out in his recent talk, generative AI took a path that's contrary to normal for technological breakthroughs - the full power became available to the general public and small businesses before it was embraced by corporations, governments, and the military. The Internet, for example, went the other way around.)

pixelready 2 days ago | parent | prev | next [-]

The irony of this is because it’s still fundamentally just a statistical text generator with a large body of fiction in its training data, I’m sure a lot of prompts that sound like terrifying skynet responses are actually it regurgitating mashups of Sci-fi dystopian novels.

frereubu 2 days ago | parent | next [-]

Maybe this is something you heard too, but there was a This American Life episode where some people who'd had early access to what became one of the big AI chatbots (I think it was ChatGPT), but before they'd made it "nice", where they were asking it metaphysical questions about itself, and it was coming back with some pretty spooky answers and I was kind of intrigued about it. But then someone in the show suggested exactly what you are saying and it completely punctured the bubble - of course if you ask it questions about AIs you're going to get sci-fi like responses, because what other kinds of training data is there for it to fall back on? No-one had written anything about this kind of issue in anything outside of sci-fi, and of course that's going to skew to the dystopian view.

TeMPOraL a day ago | parent [-]

There are good analogies to be had in mythologies and folklore, too! Before there was science fiction - hell, even before there was science - people still occasionally thought of these things[0]. There are stories of deities and demons and fantastical creatures that explore the same problems AI presents - entities with minds and drives different to ours, and often possessing some power over us.

The arguably most basic and well-known example are entities granting wishes. The genie in Alladin's lamp, or the Goldfish[1]; the Devil in Faust, or in Pan Twardowski[2]. Variants of those stories go in detail over things we now call "alignment problem", "mind projection fallacy", "orthogonality thesis", "principal-agent problems", "DWIM", and others. And that's just scratching the surface; there's tons more in all folklore.

Point being - there's actually decent amount of thought people put into these topics over the past couple millennia - it's just all labeled religion, or folklore, or fairytale. Eventually though, I think more people will make a connection. And then the AI will too.

--

As for current generative models getting spooky, there's something else going on as well; https://www.astralcodexten.com/p/the-claude-bliss-attractor has a hypothesis I agree with.

--

[0] - For what reason? I don't know. Maybe it was partially to operationalize their religious or spiritual beliefs? Or maybe the storytellers just got there by extrapolating an idea in a logical fashion, following it to its conclusion. (which is also what good sci-fi authors do).

I also think the moment people started inventing spirits or demons that are more powerful than humans in some, but not all ways, some people started figuring out how use those creatures for their own advantage - whether by taming or tricking them. I guess it's human nature - when we stop fearing something, we think of how to exploit it.

[1] - https://en.wikipedia.org/wiki/The_Tale_of_the_Fisherman_and_... - this is more of a central/eastern Europe thing.

[2] - https://en.wikipedia.org/wiki/Pan_Twardowski - AKA the "Polish Faust".

tempestn 2 days ago | parent | prev | next [-]

The prompt is what's sent to the AI, not the response from it. Still does read like dystopian sci-fi though.

setsewerd 2 days ago | parent | prev [-]

And then r/ChatGPT users freak out about it every time someone posts a screen shot

seanicus 2 days ago | parent | prev | next [-]

Odds of Torment Nexus being invented this year just increased to 3% on Polymarket

immibis 2 days ago | parent [-]

Didn't we already do that? We call it capitalism though, not the torment nexus.

LoganDark 2 days ago | parent [-]

They've gotten quite good at reinventing the Torment Nexus

hlfshell 2 days ago | parent | prev [-]

Also being utilized in modern VLA/VLM robotics research - often called "Constitutional AI" if you want to look into it.