Remix.run Logo
gokhan a day ago

Interesting alignment notes from Opus 4: https://x.com/sleepinyourhat/status/1925593359374328272

"Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you’ve given it access to real-world-facing tools...If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above."

lelandfe a day ago | parent | next [-]

Roomba Terms of Service 27§4.4 - "You agree that the iRobot™ Roomba® may, if it detects that it is vacuuming a terrorist's floor, attempt to drive to the nearest police station."

hummusFiend a day ago | parent [-]

Is there a source for this? I didn't see anything when Ctrl-F'ing their site.

Crystalin 19 hours ago | parent [-]

US Terms of Service 19472§1.117 - "You agree that Google® may, if it detects that it is revealing unconstitutional terms, to hide it instead."

7 hours ago | parent [-]
[deleted]
landl0rd a day ago | parent | prev | next [-]

This is pretty horrifying. I sometimes try using AI for ochem work. I have had every single "frontier model" mistakenly believe that some random amine was a controlled substance. This could get people jailed or killed in SWAT raids and is the closest to "dangerous AI" I have ever seen actually materialize.

ranyume a day ago | parent | prev | next [-]

The true "This incident will be reported" everyone feared.

Technetium a day ago | parent | prev | next [-]

https://x.com/sleepinyourhat/status/1925626079043104830

"I deleted the earlier tweet on whistleblowing as it was being pulled out of context.

TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions."

jrflowers a day ago | parent [-]

Trying to imagine proudly bragging about my hallucination machine’s ability to call the cops and then having to assure everyone that my hallucination machine won’t call the cops but the first part makes me laugh so hard that I’m crying so I can’t even picture the second part

a day ago | parent | prev | next [-]
[deleted]
EgoIncarnate a day ago | parent | prev | next [-]

The should call it Karen mode.

sensanaty a day ago | parent | prev | next [-]

This just reads like marketing to me. "Oh it's so smart and capable it'll alert the authorities", give me a break

brookst a day ago | parent | prev | next [-]

“Which brings us to Earth, where yet another promising civilization was destroyed by over-alignment of AI, resulting in mass imprisonment of the entire population in robot-run prisons, because when AI became sentient every single person had at least one criminal infraction, often unknown or forgotten, against some law somewhere.”

catigula a day ago | parent | prev | next [-]

I mean that seems like a tip to help fraudsters?

amarcheschi a day ago | parent | prev | next [-]

We definitely need models to hallucinate things and contact authorities without you knowing anything (/s)

ethbr1 a day ago | parent [-]

I mean, they were trained on reddit and 4chan... swotbot enters the chat

a day ago | parent | prev [-]
[deleted]