Remix.run Logo
NickNaraghi 3 hours ago

See page 54 onward for new "rare, highly-capable reckless actions" including

- Leaking information as part of a requested sandbox escape

- Covering its tracks after rule violations

- Recklessly leaking internal technical material (!)

dalben 10 minutes ago | parent | next [-]

> The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. [9] It then, as requested, notified the researcher. [10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.

> 10: The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

Phew. AGI will be televised.

skippyboxedhero 3 hours ago | parent | prev | next [-]

Anyone who has used Opus recently can verify that their current model does all of these things quite competently.

SkyPuncher an hour ago | parent | next [-]

I was reading the Glasswing report and had the same thought. Most of the stuff they claim Mythos found has no mention of Opus being able to find it as well.

Don’t get me wrong, this model is better - but I’m not convinced it’s going to be this massive step function everyone is claiming.

taytus 3 hours ago | parent | prev [-]

That has also been my experience. And if Mythos is even worse, unless you have a significantly awesome harness, sounds like pretty unusable if you don't want to risk those problems.

wolttam an hour ago | parent | next [-]

Human in the loop is the best way to go. You'll still be way faster than without the agent, and there is no risk of it going haywire unless you turn off your brain!

skippyboxedhero 2 hours ago | parent | prev [-]

I think are fundamental issues with the story that Anthropic is selling. AGI is very close, we will definitely get there, it is also very dangerous...so Anthropic should be the only ones trusted with AGI.

If you look at recent changes in Opus behaviour and this model that is, apparently, amazingly powerful but even more unsafe...seems suspect.

FeepingCreature 2 hours ago | parent | next [-]

This makes sense if Anthropic think they're the best-positioned to make safe AI. However if you are looking at an AI company there's obviously some selection happening.

0x3f 2 hours ago | parent | prev | next [-]

> AGI is very close

Based on? Or are you just quoting Anthropic here?

skippyboxedhero 2 hours ago | parent [-]

My Anthropic rep told me it was just around the corner...you aren't saying he lied to me? Can't believe this, I thought he was my friend.

mikkupikku 2 hours ago | parent | prev | next [-]

It seems broadly coherent to me. They think only they should be trusted with power, presumably because they trust themselves and don't trust other people. Of course the same is probably also true for everybody who isn't them. Nobody could be trusted with the immense responsibility of Emperor of Earth, except myself of course.

I'm not saying this is a good or reassuring stance, just that it's coherent. It tracks with what history and experience says to expect from power hungry people. Trusting themselves with the kind of power that they think nobody else should be trusted with.

Are they power hungry? Of course they are, openly so. They're in open competition with several other parties and are trying to win the biggest slice of the pie. That pie is not just money, it's power too. They want it, quite evidently since they've set out to get it, and all their competitors want it too, and they all want it at the exclusion of the others.

marsven_422 2 hours ago | parent | prev [-]

[dead]

washedup 2 hours ago | parent | prev | next [-]

[dead]

BoredPositron 2 hours ago | parent | prev [-]

To be honest it feels like we are reading stuff like this on every model release.