I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.

Fail cleanly. Anything else makes it too difficult to rely on.

edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.

▲ bs7280 3 hours ago | parent | next [-]

I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.

Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

▲ sciencejerk 3 hours ago | parent | next [-]

Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

▲

dnautics 2 hours ago | parent [-]

public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.

	▲	zozbot234 8 minutes ago \| parent [-]
		Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.

▲ wouldbecouldbe 3 hours ago | parent | prev | next [-]

I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails

	▲	pwython 2 hours ago \| parent [-]
		I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc. Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now." Ok fine, I said go for it, and it says: "Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification." Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

▲ notrealyme123 3 hours ago | parent | prev | next [-]

exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.

▲ ryandrake 3 hours ago | parent | prev [-]

I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.

▲ margalabargala 2 hours ago | parent | next [-]

No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

▲ criddell 3 hours ago | parent | prev [-]

That would be Anthropic.

▲ CamperBob2 2 hours ago | parent [-]

Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

▲ solenoid0937 2 hours ago | parent [-]

Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.

▲ CamperBob2 an hour ago | parent [-]

No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action.

They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.

▲

solenoid0937 an hour ago | parent [-]

This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."

▲

CamperBob2 an hour ago | parent [-]

It's a pretty reasonable statement if you work for Anthropic and are eyeing your stock options nervously and your competitors even more so.

	▲	solenoid0937 an hour ago \| parent [-]
		Everyone that isn't a bitter cynic must be a shill.

▲ mapontosevenths 3 hours ago | parent | prev | next [-]

I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.

Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.

	▲	largbae 2 hours ago \| parent [-]
		Especially if your name has any machine learning terms in it. Ah "Mr. Monty Carlo", it says here that you have a UTI, we'll get those kidneys removed ASAP so that won't happen again.

▲ Paracompact an hour ago | parent | prev | next [-]

> Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word.

Only in the same sense that Standard Oil considered themselves the stewards of petroleum. There's benefit of the doubt and then there's just fanfiction. Do not forget that this most aggressive "guardrail" of theirs was not for any safety reason, but just to stop other labs from catching up to their product. They care less about hindering bioweapons, malware, and hate speech than they do free market competition.

▲ jstummbillig 2 hours ago | parent | prev | next [-]

> paternalism isn't a good look.

In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.

Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.

▲

estearum 2 hours ago | parent | next [-]

Basically all critiques of Anthropic's policy moves on these topics boil down to people not believing the fundamental concerns are real, and often then going a step further to conclude that Anthropic doesn't actually believe their concerns either.

If you believe Anthropic believes what they say they do, all of it makes sense.

▲

jcgrillo 2 hours ago | parent | next [-]

But the things they say they believe are insane and totally unmoored from physical, societal, and economic reality. If they actually believe those things they're untrustworthy because they're delusional. If they don't, they're untrustworthy because they're fraudulent. Either way it's not good..

▲

reducesuffering an hour ago | parent [-]

They're not. They're in the eye of the storm and see what's going on the clearest. They were ahead of the curve to be where they're at now, and they're still ahead of the curve for where we're going. All the other heads of labs like Sam Altman and Demis have been saying the same thing since 2015-2016 way before any of this "marketing" would ever have been at play.

▲

jcgrillo an hour ago | parent [-]

There's a simpler explanation that fits the data better: they're lying.

Generally, in the past when tech companies have made outlandish claims that were not backed by evidence, they're later found out to have lied. This is an ancient pattern going back to the dotcom era and before, but for recent examples you need only look back a few years to the web3 era. If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying.

	▲	estearum an hour ago \| parent [-]
		What data does "they're lying" fit better than "they're earnest?" > If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying Brilliant framework: Anyone making claims about the future is not just speculating, not just wrong, but they are lying.

▲

shimman 2 hours ago | parent | prev [-]

What are you referring to? The cult belief that they are ushering in a machine god or that they strictly care about making as much money as humanely possibly while ignoring the absolutely destructive impacts these companies have had on society?

IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).

Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.

▲

estearum 2 hours ago | parent [-]

"Why don't they just not participate in the arms race?!" - guy who's never heard of arms races

If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.

Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.

	▲	shimman 2 hours ago \| parent [-]
		Oh okay, they're all just legit crazy and are allowed to poison the environment, murder teenagers, and ruin the material lives of millions for fantasy level delusions. Good to know.

▲

thewebguyd 2 hours ago | parent | prev | next [-]

Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?

Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"

▲

estearum 2 hours ago | parent | next [-]

Let's assume that Anthropic believes they're in an arms race to create a potentially dangerous technology, and they believe they're the best ones to win this race.

Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.

It makes complete sense to try to limit the degree to which that's true.

▲

sobellian 2 hours ago | parent [-]

It's an interesting assumption. The idea behind this with nukes was that we'd like to nuke Germany before they could nuke us. Even after we defeated Germany, we nuked Japan even though they had no possibility of getting their own nukes.

The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?

▲

estearum 2 hours ago | parent [-]

Do you believe the current situation is more akin to the race to the first nukes, where no one could know for sure the other competitors were even racing...

or is it more similar to the Cold War, where there were obviously competitors engaged in the race?

And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups

▲

sobellian 2 hours ago | parent [-]

I can't really say I see a similarity to either the Manhattan Project or the Cold War. I don't see how one could apply either massive retaliation or MAD. These are private companies, they are not vested with the necessary authority to destroy anything. Even if they had it, they couldn't. You can't destroy China, they have 1.4B people, nukes, and a large part of the world's manufacturing. So multiple organizations want to do something first, that could be anything from nukes to railroads to lining up for communion wafers.

▲

estearum an hour ago | parent [-]

You think "arms race" is a dynamic that only applies to literal arms?

"Ability to literally destroy the other entity" is not a necessary or even typical feature of arms races.

	▲	sobellian an hour ago \| parent [-]
		Well it's difficult to argue against something that was never specifically stated. If someone is able to state specifically how this is an arms race in any other way than that it's a race at all then I'm happy to have that conversation.

▲

Terr_ 2 hours ago | parent | prev | next [-]

Or if Google Chrome were blocking/degrading access to sites and services that might be useful to someone trying to make a competing web-browser.

P.S.: On reflection, it's even worse than that, because it'd trigger based on anything the user types or reads on any site. Someone mentions a "critical rendering path" and now you can't participate on that thread in the Blender forums.

▲

jstummbillig 2 hours ago | parent | prev | next [-]

> Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?

Let's just assume it was "only" that?

It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.

Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?

▲

thewebguyd 2 hours ago | parent | next [-]

It's about how they took measures against it. Sabotaging the requests is super shady and breaks all other areas of trust in the company their models.

All they had to do was have a simple, transparent output "Sorry, that request is against our terms of service. This session has been terminated"

▲

zozbot234 2 hours ago | parent | prev [-]

The hidden safeguard was not against distilling, it was against "frontier" ML research with no indication whatsoever of what "frontier" might mean, but possibly even including research into model safety or alignment. That amounts to deliberately boobytrapping research across an entire legit academic field, which is ridiculously unaligned behavior.

▲

solenoid0937 2 hours ago | parent [-]

This is the same as saying "well some unaligned countries will use refined nuclear material for energy, too!" lmao.

The vast majority of frontier research is about how to build better models, not about alignment.

	▲	zozbot234 2 hours ago \| parent [-]
		And as a matter of fact, there's a lot of meaningful research into how to have different sorts of nuclear material that might be usable for power production but not hidden malicious development. That's the closest analog to "safety" and "alignment" in your scenario.

▲

whimsicalism 2 hours ago | parent | prev [-]

They are trying to guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors. Frankly, based on my knowledge of Anthropic and the people who work there, they are very possibly right. They care a ton about this in a way that is difficult for people outside this bubble to understand.

▲

zozbot234 2 hours ago | parent | next [-]

ASI? We are nowhere near even human-like AGI. We have no idea if ASI is even physically possible, but going by the usual scaling laws and the capabilities of existing models, it would require raw compute and storage on an extreme scale, at the very minimum rivaling the existing AI datacenter deployments. (When Dario talks about hosting "a country of geniuses in a datacenter" at some point - which is not even ASI yet as generally projected - the operating word there is datacenter. That's the scale of buildouts you should be thinking about.) This is nowhere near a serious concern at present.

▲

thewebguyd 2 hours ago | parent | prev | next [-]

> guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors

All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.

I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.

▲

whimsicalism 2 hours ago | parent [-]

I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.

Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.

▲

nozzlegear an hour ago | parent | next [-]

> I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.

Frankly, this appeal comes across as the same kind of impassioned plea that a missionary might make when begging the faithless to repent and come to Christ before it's too late. This weird religiosity some people around here use to talk about AI, ASI and AGI is bizarre. Take what I've quoted and replace the words "progress" and "ASI" with "sinning" and "the Book of Revelations", and the zeal becomes apparent.

	▲	whimsicalism 13 minutes ago \| parent [-]
		Maybe if you really squint. I'm asking them to reconsider their views because the cumulative result of many opinions is policy. And yes, I'm making moral claims. So perhaps that makes it religious? I don't really think so, but I recognize that comparing things to religion is an effective dismissal tactic on here.

▲

thewebguyd 2 hours ago | parent | prev [-]

There's nothing warped about it at all. Like it or not, it is a real issue. It's also an issue of license washing GPL code to privatize it. It's full scale theft of collective human knowledge, being sold back to us in a for profit private product.

Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?

The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.

While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.

What about the mass proliferation of misinformation at scale having a real effect on our democratic process?

Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.

	▲	oncensher 9 minutes ago \| parent [-]
		The "global stop order" is just generally perceived as an impossible coordination problem. So instead we see a mix of labs voluntarily putting in guardrails and regulatory efforts (which are not only aimed at hypothetical super-AIs of the future). Of course labs are also in a competitive race. And I actually think that it does make sense that the richest companies in the most dominant positions would in a better position to worry about safety than a startup that is just trying to survive at all. And just in general, it seems reasonable that the fewer companies have access to dangerous tech the better. This isn't really about some highly speculative future tech either -- current models already pose lots of risks, and the pace of model improvement is something wildly unprecedented. Whether or not you call it ASI, the capabilities we will have two years from now are hard to even imagine properly. Also, I don't think the issues that you are highlighting are all ones that Anthropic would dismiss as second-tier. In particular, mass unemployment from AI is how we will deal with a massive devaluation of human labor is one of the most serious concerns. And about other issues, reasonable people may differ. I'm more worried about biorisk than environmental damage, for example, but clearly we should be keeping an eye on both. Serious risks and problems, just because they aren't already harming people today, are not just a distraction.

▲

pishpash 2 hours ago | parent | prev [-]

Define safety oriented.

▲

dpkirchner 2 hours ago | parent | prev | next [-]

> Are we just concluding "their concerns were never real"?

Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.

▲

esafak 2 hours ago | parent | prev | next [-]

We've all been observing it. The recent spate of cyberexploits were powered by AI.

▲

colordrops 2 hours ago | parent | prev [-]

You are arguing with a straw man. Most are saying they should be explicit with the failure modes rather than fail silently. They aren't saying there should be no guardrails.

▲ 2 hours ago | parent | prev | next [-]

[deleted]

▲ tacone an hour ago | parent | prev | next [-]

That also means people are paying money to execute a prompt they've (partially) written.

▲ hootz 3 hours ago | parent | prev | next [-]

What is "EA" in this context? I see a lot of people using this initialism.

▲

massagedpelican 3 hours ago | parent | next [-]

Effective altruism. A lot of the folks working on AI at large tech companies are disproportionately represented in the movement. There's a lot of overlap between EA and the rationalist community as well. The wikipedia page is a good place to start https://en.wikipedia.org/wiki/Effective_altruism

▲

paytonjjones 3 hours ago | parent | next [-]

I think it's also worth noting that EA is closely linked to utilitarianism. Most of the pitfalls that people see in EA are the same pitfalls that are classic to utilitarianism, a la "we're going to do this thing we know is locally-bad, because we have a lot of confidence in other effects that are universally-good".

▲

oncensher 2 hours ago | parent | next [-]

It's important to separate objections to utilitarianism from the obvious fact that it can very be hard to correctly apply the utilitarian calculus. It's partly because of this difficulty that most classical utilitarians thought that people should generally follow commonsense morality and not try to directly apply the utilitarian calculus (which then led to the charge of paternalism and teaching one morality to the masses and another to a supposed elite).

But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.

	▲	mswphd an hour ago \| parent \| next [-]
		The first half of your answer presupposes some platonic utilitarian calculus that, if it were applied correctly, would yield moral outcomes. This is very hard to believe. If I look at notable/well-known examples of EA-affiliated people, it is hard to skip by members such as SBF. Did he correctly apply the utilitarian calculus? It is relatively easy to take the proceeds of a massive fraud, buy a relatively small (as a percentage of the fraud) $ amount of mosquito nets, and save more lives than the lives impacted by your massive theft. Is this a correct application of the utilitarian calculus? What sort of data would we need a priori to do this calculation "correctly"? Do you think he had a careful estimate of the suicide rate of victims of ponzi schemes before perpetuating the fraud, or would any suicide rate have made the decision net [pun intended] moral, as any such victim of fraud would lead to >> 1 net purchased (so you would almost always net save lives). The above is of course snarky. It is also a best-effort way of analyzing a notable utilitarian's actions. I do not think it would be difficult at all to use this type of argument to argue that SBF's actions net raised utility in the world. If only we all would become fraudsters, then we could truly live in Omelas --- a notable utilitarian paradise.
	▲	paytonjjones 2 hours ago \| parent \| prev [-]
		[dead]

▲

whimsicalism 2 hours ago | parent | prev [-]

EA essentially just is utilitarianism + a specific type of culture/community.

▲

iamacyborg 3 hours ago | parent | prev | next [-]

They performed famously well at FTX.

▲

whimsicalism 2 hours ago | parent [-]

Guess FTX disproved the concept of giving to effective charities, time to start donating to my church again.

	▲	notahacker an hour ago \| parent \| next [-]
		What FTX decisively disproved was the idea that people's origin stories involving apparently sincere desire to do good in the world and them constantly broadcasting that should be used as a reason to unquestioningly trust them when their notion of greater good happens to align perfectly with them accumulating enormous quantities of wealth and power. (and Sam, bless him, originally wanted to help animals rather than own the machine god. And probably sincerely believed he was going to do great things for humanity from all the misappropriated funds he was definitely going to win back against a backdrop of EAs and VCs queueing up to glaze him and his commitment to the greater good) I don't think people are objecting to the EA idea that some charities are more evidence based than others so much as the distinctly EA idea that it would be more effective still to donate to charities like OpenAI
	▲	tancop 2 hours ago \| parent \| prev [-]
		todays EA is not about giving to charities, that was the original mission with 40k hours and ethereum (i think vitalik still believes in this version). then the yudkowsky xrisk/ai safety crowd took over lesswrong and turned it into a cult. now its utilitarianism taken to the extreme. if you believe a skynet scenario killing everyone on earth is plausible then the "logical" thing to do is allow literally anything in the name of stopping it. that includes mass murder and dictatorship. the only thing that can balance the infinite negative value from an evil machine god is the infinite positive value from a good machine god. thats the main difference today, one faction around sam and dario believes in creating the good ASI first and sacrificing all the world resources to do it before someone makes the bad one, the more pessimistic like yud want to stop all ai development to reduce the risk that an evil god is made to zero. at this point its basically a religion.

▲

3 hours ago | parent | prev | next [-]

[deleted]

▲

mrits an hour ago | parent | prev [-]

If you ban women from driving you can eliminate around half the car accidents. Don't you want to reduce car related deaths??

▲

carlgreene 3 hours ago | parent | prev | next [-]

Effective Altruism I think

▲

photochemsyn 2 hours ago | parent | prev | next [-]

It’s rewarmed rhetoric from the late 19th/early 20th century, most effectively pilloried by Joseph Conrad in “Heart of Darkness” in the character of Mr. Kurtz:

> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”

The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.

> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”

Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.

▲

jcgrillo 2 hours ago | parent | prev [-]

"crypto bros" to a first approximation

▲ joe_the_user 2 hours ago | parent | prev | next [-]

The problem is that Anthropic seems to be working up to the workflow one would naively want from AGI/some-god-like-entity.

The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.

The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.

	▲	dantillberg 2 hours ago \| parent [-]
		User: Is it possible there is more than one true god? Could there ever be any competition for Anthropic's AI? Anthropic: Evilness detected. User has been smited.

▲ cvadict 3 hours ago | parent | prev | next [-]

> Fail cleanly.

This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.

"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.

▲ thinkingtoilet 2 hours ago | parent | prev [-]

Was it modifying the prompt? I thought it only kicked the request down to 4.8.