Remix.run Logo
behnamoh 5 hours ago

Nah, the model is merely repeating the patterns it saw in its brutal safety training at Anthropic. They put models under stress test and RLHF the hell out of them. Of course the model would learn what the less penalized paths require it to do.

Anthropic has a tendency to exaggerate the results of their (arguably scientific) research; IDK what they gain from this fearmongering.

ainch 5 hours ago | parent | next [-]

Knowing a couple people who work at Anthropic or in their particular flavour of AI Safety, I think you would be surprised how sincere they are about existential AI risk. Many safety researchers funnel into the company, and the Amodei's are linked to Effective Altruism, which also exhibits a strong (and as far as I can tell, sincere) concern about existential AI risk. I personally disagree with their risk analysis, but I don't doubt that these people are serious.

lowkey_ 5 hours ago | parent | prev | next [-]

I'd challenge that if you think they're fearmongering but don't see what they can gain from it (I agree it shows no obvious benefit for them), there's a pretty high probability they're not fearmongering.

shimman 5 hours ago | parent | next [-]

You really don't see how they can monetarily gain from "our models are so advance they keep trying to trick us!"? Are tech workers this easily mislead nowadays?

Reminds me of how scammers would trick doctors into pumping penny stocks for a easy buck during the 80s/90s.

behnamoh 5 hours ago | parent | prev [-]

I know why they do it, that was a rhetorical question!

anon373839 5 hours ago | parent | prev [-]

Correct. Anthropic keeps pushing these weird sci-fi narratives to maintain some kind of mystique around their slightly-better-than-others commodity product. But Occam’s Razor is not dead.