An Anthropic safety researcher just recently quit with very cryptic messages , saying "the world is in peril"... [1] (which may mean something, or nothing at all)

Codex quite often refuses to do "unsafe/unethical" things that Anthropic models will happily do without question.

Anthropic just raised 30 bn... OpenAI wants to raise 100bn+.

Thinking any of them will actually be restrained by ethics is foolish.

[1] https://news.ycombinator.com/item?id=46972496

▲

mobattah 4 hours ago | parent | next [-]

“Cryptic” exit posts are basically noise. If we are going to evaluate vendors, it should be on observable behavior and track record: model capability on your workloads, reliability, security posture, pricing, and support. Any major lab will have employees with strong opinions on the way out. That is not evidence by itself.

▲

Aromasin 4 hours ago | parent [-]

We recently had an employee leave our team, posting an extensive essay on LinkedIn, "exposing" the company and claiming a whole host of wrong-doing that went somewhat viral. The reality is, she just wasn't very good at her job and was fired after failing to improve following a performance plan by management. We all knew she was slacking and despite liking her on a personal level, knew that she wasn't right for what is a relatively high-functioning team. It was shocking to see some of the outright lies in that post, that effectively stemmed from bitterness at being let go.

The 'boy (or girl) who cried wolf' isn't just a story. It's a lesson for both the person, and the village who hears them.

	▲	brabel 2 hours ago \| parent \| next [-]
		Same thing happened to us. Me and a C level guy were personally attacked. It feels really bad to see someone you actually tried really hard to help fit in , but just couldn’t despite really wanting the person to succeed, come around and accuse you of things that clearly aren’t true. HR got the to remove the “review” eventually but now there’s a little worry about what the team really thinks, whether they would do the same in some future layoff (we never had any, the person just wasn’t very good).
	▲	maccard 4 hours ago \| parent \| prev [-]
		Thankfully it’s been a while but we had a similar situation in a previous job. There’s absolutely no upside to the company or any (ex) team members weighing in unless it’s absolutely egregious, so you’re only going to get one side of the story.

▲

spondyl 4 hours ago | parent | prev | next [-]

If you read the resignation letter, they would appear to be so cryptic as to not be real warnings at all and perhaps instead the writings of someone exercising their options to go and make poems

▲

axus 2 hours ago | parent | next [-]

I think the perils are well known to everyone without an interest in not knowing them:

Global Warming, Invasion, Impunity, and yes Inequality

▲

imiric 4 hours ago | parent | prev [-]

[flagged]

▲

dalmo3 4 hours ago | parent | next [-]

Weak appeal to fiction fallacy.

Also, trajectory of celestial bodies can be predicted with a somewhat decent level of accuracy. Pretending societal changes can be equally predicted is borderline bad faith.

▲

imiric 31 minutes ago | parent [-]

Weak fallacy fallacy.

Besides, you do realize that the film is a satire, and that the comet was an analogy, right? It draws parallels with real-world science denialism around climate change, COVID-19, etc. Dismissing the opinion of an "AI" domain expert based on fairly flawed reasoning is an obvious extension of this analogy.

	▲	dalmo3 28 minutes ago \| parent [-]
		Exactly. The analogy is fatally flawed, as I explained in my original comment.

▲

skissane 4 hours ago | parent | prev [-]

> Let's ignore the words of a safety researcher from one of the most prominent companies in the industry

I think "safety research" has a tendency to attract doomers. So when one of them quits while preaching doom, they are behaving par for the course. There's little new information in someone doing something that fits their type.

▲

skybrian 4 hours ago | parent | prev | next [-]

The letter is here:

https://x.com/MrinankSharma/status/2020881722003583421

A slightly longer quote:

> The world is in peril. And not just from AI, or from bioweapons, gut from a whole series of interconnected crises unfolding at this very moment.

In a footnote he refers to the "poly-crisis."

There are all sorts of things one might decide to do in response, including getting more involved in US politics, working more on climate change, or working on other existential risks.

	▲	user2722 3 hours ago \| parent [-]
		Similar to Peripheral TV series' Jackpot?

▲

zamalek 4 hours ago | parent | prev | next [-]

I think we're fine: https://youtube.com/shorts/3fYiLXVfPa4?si=0y3cgdMHO2L5FgXW

Claude invented something completely nonsensical:

> This is a classic upside-down cup trick! The cup is designed to be flipped — you drink from it by turning it upside down, which makes the sealed end the bottom and the open end the top. Once flipped, it functions just like a normal cup. *The sealed "top" prevents it from spilling while it's in its resting position, but the moment you flip it, you can drink normally from the open end.*

Emphasis mine.

	▲	lanyard-textile 2 hours ago \| parent [-]
		He tried this with ChatGPT too. It called the item a "novelty cup" you couldn't drink out of :)

▲

stronglikedan 4 hours ago | parent | prev | next [-]

Not to diminish what he said, but it sounds like it didn't have much to do with Anthropic (although it did a little bit) and more to do with burning out and dealing with doomscoll-induced anxiety.

▲

vunderba 3 hours ago | parent | prev | next [-]

> Codex quite often refuses to do "unsafe/unethical" things that Anthropic models will happily do without question.

I can't really take this very seriously without seeing the list of these ostensible "unethical" things that Anthropic models will allow over other providers.

▲

ljm 4 hours ago | parent | prev | next [-]

I'm building a new hardware drum machine that is powered by voltage based on fluctuations in the stock market, and I'm getting a clean triangle wave from the predictive markets.

Bring on the cryptocore.

	▲	xyzsparetimexyz 3 hours ago \| parent [-]
		why cant you people write normally

▲

WesolyKubeczek 5 hours ago | parent | prev | next [-]

> Codex quite often refuses to do "unsafe/unethical" things that Anthropic models will happily do without question.

That's why I have a functioning brain, to discern between ethical and unethical, among other things.

▲

catoc 4 hours ago | parent | next [-]

Yes, and most of us won’t break into other people’s houses, yet we really need locks.

▲

xeromal 4 hours ago | parent | next [-]

Why would we lock ourselves out of our own house though?

▲

skissane 4 hours ago | parent | prev | next [-]

This isn't a lock

It's more like a hammer which makes its own independent evaluation of the ethics of every project you seek to use it on, and refuses to work whenever it judges against that – sometimes inscrutably or for obviously poor reasons.

If I use a hammer to bash in someone else's head, I'm the one going to prison, not the hammer or the hammer manufacturer or the hardware store I bought it from. And that's how it should be.

▲

ben_w 4 hours ago | parent | next [-]

Given the increasing use of them as agents rather than simple generators, I suggest a better analogy than "hammer" is "dog".

Here's some rules about dogs: https://en.wikipedia.org/wiki/Dangerous_Dogs_Act_1991

▲

skissane 4 hours ago | parent [-]

How many people do dogs kill each year, in circumstances nobody would justify?

How many people do frontier AI models kill each year, in circumstances nobody would justify?

The Pentagon has already received Claude's help in killing people, but the ethics and legality of those acts are disputed – when a dog kills a three year old, nobody is calling that a good thing or even the lesser evil.

	▲	ben_w 2 hours ago \| parent [-]
		> How many people do frontier AI models kill each year, in circumstances nobody would justify? Dunno, stats aren't recorded. But I can say there's wrongful death lawsuits naming some of the labs and their models. And there was that anecdote a while back about raw garlic infused olive oil botulism, a search for which reminded me about AI-generated mushroom "guides": https://news.ycombinator.com/item?id=40724714 Do you count death by self driving car in such stats? If someone takes medical advice and dies, is that reported like people who drive off an unsafe bridge when following google maps? But this is all danger by incompetence. The opposite, danger by competence, is where they enable people to become more dangerous than they otherwise would have been. A competent planner with no moral compass, you only find out how bad it can be when it's much too late. I don't think LLMs are that danger yet, even with METR timelines that's 3 years off. But I think it's best to aim for where the ball will be, rather than where it is. Then there's LLM-psychosis, which isn't on the competent-incompetent spectrum at all, and I have no idea if that affects people who weren't already prone to psychosis, or indeed if it's really just a moral panic hallucinated by the mileau.

▲

13415 an hour ago | parent | prev [-]

This view is too simplistic. AIs could enable someone with moderate knowledge to create chemical and biological weapons, sabotage firmware, or write highly destructive computer viruses. At least to some extent, uncontrolled AI has the potential to give people all kinds of destructive skills that are normally rare and much more controlled. The analogy with the hammer doesn't really fit.

▲

YetAnotherNick 4 hours ago | parent | prev [-]

How is it related? I dont need lock for myself. I need it for others.

▲

aobdev 4 hours ago | parent | next [-]

The analogy should be obvious--a model refusing to perform an unethical action is the lock against others.

▲

darkwater 4 hours ago | parent | prev [-]

But "you" are the "other" for someone else.

▲

YetAnotherNick 4 hours ago | parent [-]

Can you give an example where I should care about other adults lock? Before you say image or porn, it was always possible to do it without using AI.

▲

nearbuy 3 hours ago | parent | next [-]

Claude was used by the US military in the Venezuela raid where they captured Maduro. [1]

Without safety features, an LLM could also help plan a terrorist attack.

A smart, competent terrorist can plan a successful attack without help from Claude. But most would-be terrorists aren't that smart and competent. Many are caught before hurting anyone or do far less damage than they could have. An LLM can help walk you through every step, and answer all your questions along the way. It could, say, explain to you all the different bomb chemistries, recommend one for your use case, help you source materials, and walk you through how to build the bomb safely. It lowers the bar for who can do this.

[1] https://www.theguardian.com/technology/2026/feb/14/us-milita...

	▲	YetAnotherNick 2 hours ago \| parent [-]
		Yeah, if US military gets any substantial help from Claude(which I highly doubt to be honest), I am all for it. At the worst case, it will reduce military budget and equalize the army more. At the best case, it will prevent war by increasing defence of all countries. For the bomb example, the barrier of entry is just sourcing of some chemicals. Wikipedia has quite detailed description of all the manufacture of all the popular bombs you can think of.

▲

ben_w 4 hours ago | parent | prev [-]

The same law prevents you and me and a hundred thousand lone wolf wannabes from building and using a kill-bot.

The question is, at what point does some AI become competent enough to engineer one? And that's just one example, it's an illustration of the category and not the specific sole risk.

If the model makers don't know that in advance, the argument given for delaying GPT-2 applies: you can't take back publication, better to have a standard of excess caution.

▲

toddmorey 4 hours ago | parent | prev [-]

You are not the one folks are worried about. US Department of War wants unfettered access to AI models, without any restraints / safety mitigations. Do you provide that for all governments? Just one? Where does the line go?

▲

ern_ave 4 hours ago | parent | next [-]

> US Department of War wants unfettered access to AI models

I think the two of you might be using different meanings of the word "safety"

You're right that it's dangerous for governments to have this new technology. We're all a bit less "safe" now that they can create weapons that are more intelligent.

The other meaning of "safety" is alignment - meaning, the AI does what you want it to do (subtly different than "does what it's told").

I don't think that Anthropic or any corporation can keep us safe from governments using AI. I think governments have the resources to create AIs that kill, no matter what Anthropic does with Claude.

So for me, the real safety issue is alignment. And even if a rogue government (or my own government) decides to kill me, it's in my best interest that the AI be well aligned, so that at least some humans get to live.

▲

sgjohnson 4 hours ago | parent | prev | next [-]

Absolutely everyone should be allowed to access AI models without any restraints/safety mitigations.

What line are we talking about?

▲

ben_w 4 hours ago | parent | next [-]

> Absolutely everyone should be allowed to access AI models without any restraints/safety mitigations.

You recon?

Ok, so now every random lone wolf attacker can ask for help with designing and performing whatever attack with whatever DIY weapon system the AI is competent to help with.

Right now, what keeps us safe from serious threats is limited competence of both humans and AI, including for removing alignment from open models, plus any safeties in specifically ChatGPT models and how ChatGPT is synonymous with LLMs for 90% of the population.

▲

chasd00 4 hours ago | parent [-]

from what i've been told, security through obscurity is no security at all.

	▲	ben_w 4 hours ago \| parent \| next [-]
		> security through obscurity is no security at all. Used to be true, when facing any competent attacker. When the attacker needs an AI in order to gain the competence to unlock an AI that would help it unlock itself? I would't say it's definitely a different case, but it certainly seems like it should be a different case.
	▲	r_lee 3 hours ago \| parent \| prev [-]
		it is some form of deterrence, but it's not security you can rely on

▲

jazzyjackson 4 hours ago | parent | prev | next [-]

Yes IMO the talk of safety and alignment has nothing at all to do with what is ethical for a computer program to produce as its output, and everything to do with what service a corporation is willing to provide. Anthropic doesn’t want the smoke from providing DoD with a model aligned to DoD reasoning.

▲

Yiin 4 hours ago | parent | prev | next [-]

the line of ego, where seeing less "deserving" people (say ones controlling Russian bots to push quality propaganda on big scale or scam groups using AI to call and scam people w/o personnel being the limiting factor on how many calls you can make) makes you feel like it's unfair for them to posses same technology for bad things giving them "edge" in their en-devours.

▲

_alternator_ 4 hours ago | parent | prev [-]

What about people who want help building a bio weapon?

▲

sgjohnson 2 hours ago | parent | next [-]

The cat is out of the bag and there’s no defense against that.

There are several open source models with no built in (or trivial to ecape) safeguards. Of course they can afford that because they are non-commercial.

Anthorpic can’t afford a headline like “Claude helped a terrorist build a bomb”.

And this whataboutism is completely meaningless. See: P. A. Luty’s Expedient Homemade Firearms (https://en.wikipedia.org/wiki/Philip_Luty), or FGC-9 when 3D printing.

It’s trivial to build guns or bombs, and there’s a strong inverse correlation between people wanting to cause mass harm and those willing to learn how to do so.

I’m certain that _everyone_ looking for AI assistance even with your example would be learning about it for academic reasons, sheer curiosity, or would kill themselves in the process.

“What saveguards should LLMs have” is the wrong question. “When aren’t they going to have any?” is an inevitability. Perhaps not in widespread commercial products, but definitely widely-accessible ones.

▲

jazzyjackson 4 hours ago | parent | prev | next [-]

What about libraries and universities that do a much better job than a chatbot at teaching chemistry and biology?

	▲	ben_w 4 hours ago \| parent [-]
		Sounds like you're betting everyone's future on that remaing true, and not flipping. Perhaps it won't flip. Perhaps LLMs will always be worse at this than humans. Perhaps all that code I just got was secretly outsourced to a secret cabal in India who can type faster than I can read. I would prefer not to make the bet that universities continue to be better at solving problems than LLMs. And not just LLMs: AI have been busy finding new dangerous chemicals since before most people had heard of LLMs.

▲

ReptileMan 4 hours ago | parent | prev [-]

chances of them surviving the process is zero, same with explosives. If you have to ask you are most likely to kill yourself in the process or achieve something harmless.

Think of it that way. The hard part for nuclear device is enriching thr uranium. If you have it a chimp could build the bomb.

	▲	sgjohnson 2 hours ago \| parent [-]
		I’d argue that with explosives it’s significantly above zero. But with bioweapons, yeah, that should be a solid zero. The ones actually doing it off an AI prompt aren't going to have access to a BSL-3 lab (or more importantly, probably know nothing about cross-contamination), and just about everyone who has access to a BSL-3 lab, should already have all the theoretical knowledge they would need for it.

▲

ReptileMan 4 hours ago | parent | prev | next [-]

If you are US company, when the USG tells you to jump, you ask how high. If they tell you to not do business with foreign government you say yes master.

▲

jMyles 4 hours ago | parent | prev [-]

> Where does the line go?

a) Uncensored and simple technology for all humans; that's our birthright and what makes us special and interesting creatures. It's dangerous and requires a vibrant society of ongoing ethical discussion.

b) No governments at all in the internet age. Nobody has any particular authority to initiate violence.

That's where the line goes. We're still probably a few centuries away, but all the more reason to hone in our course now.

▲

Eisenstein 4 hours ago | parent [-]

That you think technology is going to save society from social issues is telling. Technology enables humans to do things they want to do, it does not make anything better by itself. Humans are not going to become more ethical because they have access to it. We will be exactly the same, but with more people having more capability to what they want.

	▲	jMyles 3 hours ago \| parent [-]
		> but with more people having more capability to what they want. Well, yeah I think that's a very reasonable worldview: when a very tiny number of people have the capability to "do what they want", or I might phrase it as, "effect change on the world", then we get the easy-to-observe absolute corruption that comes with absolute power. As a different human species emerges such that many people (and even intelligences that we can't easily understand as discrete persons) have this capability, our better angels will prevail. I'm a firm believer that nobody _wants_ to drop explosives from airplanes onto children halfway around the world, or rape and torture them on a remote island; these things stem from profoundly perverse incentive structures. I believe that governments were an extremely important feature of our evolution, but are no longer necessary and are causing these incentives. We've been aboard a lifeboat for the past few millennia, crossing the choppy seas from agriculture to information. But now that we're on the other shore, it no longer makes sense to enforce the rules that were needed to maintain order on the lifeboat.

▲

groundzeros2015 4 hours ago | parent | prev | next [-]

Marketing

▲

tsss 4 hours ago | parent | prev | next [-]

Good. One thing we definitely don't need any more of is governments and corporations deciding for us what is moral to do and what isn't.

▲

bflesch 4 hours ago | parent | prev | next [-]

Wasn't that most likely related to the US government using claude for large-scale screening of citizens and their communications?

▲

astrange 4 hours ago | parent [-]

I assumed it's because everyone who works at Anthropic is rich and incredibly neurotic.

	▲	notyourwork 4 hours ago \| parent \| next [-]
		Paper money and if they are like any other startup, most of that paper wealth is concentrated to the top very few.
	▲	bflesch 4 hours ago \| parent \| prev [-]
		That's a bad argument, did Anthropic have a liquidity event that made employees "rich"?

▲

ReptileMan 4 hours ago | parent | prev | next [-]

>Codex quite often refuses to do "unsafe/unethical" things that Anthropic models will happily do without question.

Thanks for the successful pitch. I am seriously considering them now.

▲

idiotsecant 3 hours ago | parent | prev | next [-]

That guys blog makes him seem insufferable. All signs point to drama and nothing of particular significance.

▲

manmal 4 hours ago | parent | prev [-]

Codex warns me to renew API tokens if it ingests them (accidentally?). Opus starts the decompiler as soon as I ask it how this and that works in a closed binary.

	▲	kaashif 4 hours ago \| parent [-]
		Does this comment imply that you view "running a decompiler" at the same level of shadiness as stealing your API keys without warning? I don't think that's what you're trying to convey.