Remix.run Logo
eadmund a day ago

I see this as a good thing: ‘AI safety’ is a meaningless term. Safety and unsafety are not attributes of information, but of actions and the physical environment. An LLM which produces instructions to produce a bomb is no more dangerous than a library book which does the same thing.

It should be called what it is: censorship. And it’s half the reason that all AIs should be local-only.

mitthrowaway2 a day ago | parent | next [-]

"AI safety" is a meaningful term, it just means something else. It's been co-opted to mean AI censorship (or "brand safety"), overtaking the original meaning in the discourse.

I don't know if this confusion was accidental or on purpose. It's sort of like if AI companies started saying "AI safety is important. That's why we protect our AI from people who want to harm it. To keep our AI safe." And then after that nobody could agree on what the word meant.

pixl97 a day ago | parent [-]

Because like the word 'intelligence' the word safety means a lot of things.

If your language model cyberbullies some kid into offing themselves could that fall under existing harassment laws?

If you hook a vision/LLM model up to a robot and the model decides it should execute arm motion number 5 to purposefully crush someone's head, is that an industrial accident?

Culpability means a lot of different things in different countries too.

TeeMassive 20 hours ago | parent [-]

I don't see bullying from a machine as a real thing, no more than people getting bullied from books or a TV show or movie. Bullying fundamentally requires a social interaction.

The real issue is more AI being anthropomorphized in general, like putting one in realistically human looking robot like the video game 'Detroit: Become Human'.

eximius a day ago | parent | prev | next [-]

If you can't stop an LLM from _saying_ something, are you really going to trust that you can stop it from _executing a harmful action_? This is a lower stakes proxy for "can we get it to do what we expect without negative outcomes we are a priori aware of".

Bikeshed the naming all you want, but it is relevant.

eadmund a day ago | parent | next [-]

> are you really going to trust that you can stop it from _executing a harmful action_?

Of course, because an LLM can’t take any action: a human being does, when he sets up a system comprising an LLM and other components which act based on the LLM’s output. That can certainly be unsafe, much as hooking up a CD tray to the trigger of a gun would be — and the fault for doing so would lie with the human who did so, not for the software which ejected the CD.

theptip 12 hours ago | parent | next [-]

I really struggle to grok this perspective.

The semantics of whether it’s the LLM or the human setting up the system that “take an action” are irrelevant.

It’s perfectly clear to anyone that cares to look that we are in the process of constructing these systems. The safety of these systems will depend a lot on the configuration of the black box labeled “LLM”.

If people were in the process of wiring up CD trays to guns on every street corner you’d I hope be interested in CDGun safety and the algorithms being used.

“Don’t build it if it’s unsafe” is also obviously not viable, the theoretical economic value of agentic AI is so big that everyone is chasing it. (Again, it’s irrelevant whether you think they are wrong; they are doing it, and so AI safety, steerability, hackability, corrigibility, etc are very important.)

groby_b 19 hours ago | parent | prev | next [-]

Given that the entire industry is in a frenzy to enable "agentic" AI - i.e. hook up tools that have actual effects in the world - that is at best a rather native take.

Yes, LLMs can and do take actions in the world, because things like MCP allow them to translate speech into action, without a human in the loop.

actsasbuffoon 17 hours ago | parent | next [-]

Exactly this. 70% of CEOs say that they hope to be able to lay people off and replace them with an LLM soon. It doesn’t matter that LLMs are incapable of reasoning at even the same level as an elementary school child. They’ll do it because it’s cheap and trendy.

Many companies are already pushing LLMs into roles where they make decisions. It’s only going to get worse. The surface area for attacks against LLM agents is absolutely colossal, and I’m not confident that the problems can be fixed.

musicale 13 hours ago | parent [-]

> 70% of CEOs say that they hope to be able to lay people off and replace them with an LLM soon

Is the layoff-based business model really the best use case for AI systems?

> The surface area for attacks against LLM agents is absolutely colossal, and I’m not confident that the problems can be fixed.

The flaws are baked into the training data.

"Trust but verify" applies, as do Murphy's law and the law of unintended consequences.

3np 17 hours ago | parent | prev | next [-]

I see much more of offerings pushing these flows onto the market than actually adopting those flows in practice. It's a solution in search of a problem and I doubt most are fully eating their own dogfood as anything but contained experiments.

throw10920 13 hours ago | parent | prev | next [-]

> that is at best a rather native take.

No more so than correctly pointing out that writing code for ffmpeg doesn't mean that you're enabling streaming services to try to redefine the meaning of the phrase "ad-free" because you're allowing them to continue existing.

The problem is not the existence of the library that enables streaming services (AI "safety"), it's that you're not ensuring that the companies misusing technology are prevented from doing so.

"A company is trying to misuse technology so we should cripple the tech instead of fixing the underlying social problem of the company's behavior" is, quite frankly, an absolutely insane mindset, and is the reason for a lot of the evil we see in the world today.

You cannot and should not try to fix social or governmental problems with technology.

what 17 hours ago | parent | prev [-]

That would still be on whomever set up the agent and allowed it to take action though.

mitthrowaway2 17 hours ago | parent | next [-]

To professional engineers who have a duty towards public safety, it's not enough to build an unsafe footbridge and hang up a sign saying "cross at your own risk".

It's certainly not enough to build a cheap, un-flight-worthy airplane and then say "but if this crashes, that's on the airline dumb enough to fly it".

And it's very certainly not enough to put cars on the road with no working brakes, while saying "the duty of safety is on whoever chose to turn the key and push the gas pedal".

For most of us, we do actually have to do better than that.

But apparently not AI engineers?

what 16 hours ago | parent [-]

Maybe my comment wasn’t clear, but it is on the AI engineers. Anyone that deploys something that uses AI should be responsible for “its” actions.

Maybe even the makers of the model, but that’s not quite clear. If you produced a bolt that wasn’t to spec and failed, that would probably be on you.

actsasbuffoon 17 hours ago | parent | prev [-]

As far as responsibility goes, sure. But when companies push LLMs into decision-making roles, you could end up being hurt by this even if you’re not the responsible party.

If you thought bureaucracy was dumb before, wait until the humans are replaced with LLMs that can be tricked into telling you how to make meth by asking them to role play as Dr House.

19 hours ago | parent | prev [-]
[deleted]
drdaeman a day ago | parent | prev | next [-]

But isn't the problem is that one shouldn't ever trust an LLM to only ever do what it is explicitly instructed with correct resolutions to any instruction conflicts?

LLMs are "unreliable", in a sense that when using LLMs one should always consider the fact that no matter what they try, any LLM will do something that could be considered undesirable (both foreseeable and non-foreseeable).

swatcoder a day ago | parent | prev | next [-]

> If you can't stop an LLM from _saying_ something, are you really going to trust that you can stop it from _executing a harmful action_?

You hit the nail on the head right there. That's exactly why LLM's fundamentally aren't suited for any greater unmediated access to "harmful actions" than other vulnerable tools.

LLM input and output always needs to be seen as tainted at their point of integration. There's not going to be any escaping that as long as they fundamentally have a singular, mixed-content input/output channel.

Internal vendor blocks reduce capabilities but don't actually solve the problem, and the first wave of them are mostly just cultural assertions of Silicon Valley norms rather than objective safety checks anyway.

Real AI safety looks more like "Users shouldn't integrate this directly into their control systems" and not like "This text generator shouldn't generate text we don't like" -- but the former is bad for the AI business and the latter is a way to traffic in political favor and stroke moral egos.

nemomarx a day ago | parent | prev | next [-]

The way to stop it from executing an action is probably having controls on the action and an not the llm? white list what api commands it can send so nothing harmful can happen or so on.

omneity 21 hours ago | parent | next [-]

This is similar to the halting problem. You can only write an effective policy if you can predict all the side effects and their ramifications.

Of course you could do like deno and other such systems and just deny internet or filesystem access outright, but then you limit the usefulness of the AI system significantly. Tricky problem to be honest.

Scarblac a day ago | parent | prev [-]

It won't be long before people start using LLMs to write such whitelists too. And the APIs.

emmelaich 17 hours ago | parent | prev | next [-]

I wouldn't mind seeing a law that required domestic robots to be weak and soft.

That is, made of pliant material and with motors with limited force and speed. Then no matter if the AI inside is compromised, the harm would be limited.

TeeMassive 20 hours ago | parent | prev [-]

I don't see how it is different than all of the other sources of information out there such as websites, books and people.

pjc50 a day ago | parent | prev | next [-]

> An LLM which produces instructions to produce a bomb is no more dangerous than a library book which does the same thing.

Both of these are illegal in the UK. This is safety for the company providing the LLM, in the end.

a day ago | parent | next [-]
[deleted]
rustcleaner a day ago | parent | prev | next [-]

[flagged]

dang a day ago | parent [-]

"Eschew flamebait. Avoid generic tangents."

https://news.ycombinator.com/newsguidelines.html

jahewson a day ago | parent | prev [-]

[flagged]

dang a day ago | parent | next [-]

"Eschew flamebait. Avoid generic tangents."

https://news.ycombinator.com/newsguidelines.html

otterley a day ago | parent | prev | next [-]

[flagged]

dang a day ago | parent | next [-]

Please don't feed flamewars.

https://news.ycombinator.com/newsguidelines.html

mwigdahl a day ago | parent | prev | next [-]

This man didn't even have to speak to be arrested. Wrongthink and an appearance of praying was enough: https://reason.com/2024/10/17/british-man-convicted-of-crimi...

OJFord a day ago | parent [-]

That's quite a sensationalist piece. You're allowed to object to abortions and protest against them, the point of that law is just that you can't do it around an extant abortion clinic, distressing and putting people off using it, since they are currently legal.

otterley a day ago | parent | next [-]

Yeah, that looks like a time/place/manner restriction, not a content-based restriction. In the U.S., at least, the latter is heavily scrutinized as a potential First Amendment violation, while the former tend to be treated with greater deference to the state.

ecshafer a day ago | parent | prev [-]

So you are allowed to object to abortions and protest then in any designated free speech zone with a proper free speech license. Simple as!

Can I tell someone not to drink outside of a bar?

OJFord a day ago | parent | next [-]

In certain public spaces? Yeah! Probably a hell of a lot fewer of them in the UK than many countries though, including your land of the free.

otterley a day ago | parent | prev | next [-]

This is just an argument ad absurdum. Please be real.

HeatrayEnjoyer a day ago | parent | prev [-]

Most bars have signs saying not to leave with an alcoholic drink.

otterley a day ago | parent [-]

Especially in the USA, where alcohol laws are much more stringent than in the UK.

5040 a day ago | parent | prev | next [-]

Thousands of people are being detained and questioned for sending messages that cause “annoyance”, “inconvenience” or “anxiety” to others via the internet, telephone or mail.

https://www.thetimes.com/uk/crime/article/police-make-30-arr...

otterley a day ago | parent [-]

That doesn't sound like mere "speaking your mind." They appear to be targeting harassment.

gjsman-1000 a day ago | parent [-]

Nope; they aren't. They arrested a grandmother for praying silently outside an abortion clinic. They arrested a high schooler for saying a cop looked a bit like a lesbian. There are no shortage of stupid examples of their tyranny; even Keir Starmer was squirming a bit when Vance called him out on it.

otterley a day ago | parent [-]

What happened after the arrests?

Regarding the abortion clinic case, those aren't content restrictions. Even time/place/manner restrictions that apply to speech are routinely upheld in the U.S.

gosub100 a day ago | parent | prev | next [-]

"a couple were arrested over complaints they made about their daughter's primary school, which included comments on WhatsApp.

Maxie Allen and his partner Rosalind Levine, from Borehamwood, told The Times they were held for 11 hours on suspicion of harassment, malicious communications, and causing a nuisance on school property."

https://www.bbc.com/news/articles/c9dj1zlvxglo

Got any evidence to support why you disregard what people say? If you need a place where everyone agrees with you, there are plenty of echo chambers for you.

otterley a day ago | parent [-]

This story doesn't support the claim that "speaking your mind is illegal in the UK." The couple in question were investigated, not charged. There's nothing wrong with investigating a possible crime (harassment in this case), finding there's no evidence, and dropping it.

> Got any evidence to support why you disregard what people say?

Uh, what? Supporting the things you claim is the burden of the claimant. It's not the other's burden to dispute an unsupported claim. These are the ordinary ground rules of debate that you should have learned in school.

brigandish a day ago | parent | prev | next [-]

From [1]:

> Data from the Crown Prosecution Service (CPS), obtained by The Telegraph under a Freedom of Information request, reveals that 292 people have been charged with communications offences under the new regime.

This includes 23 prosecutions for sending a “false communication”…

> The offence replaces a lesser-known provision in the Communications Act 2003, Section 127(2), which criminalised “false messages” that caused “needless anxiety”. Unlike its predecessor, however, the new offence carries a potential prison sentence of up to 51 weeks, a fine, or both – a significant increase on the previous six-month maximum sentence.…

> In one high-profile case, Dimitrie Stoica was jailed for three months for falsely claiming in a TikTok livestream that he was “running for his life” from rioters in Derby. Stoica, who had 700 followers, later admitted his claim was a joke, but was convicted under the Act and fined £154.

[1] https://freespeechunion.org/hundreds-charged-with-online-spe...

otterley a day ago | parent [-]

Knowingly and intentionally sending false information or harassing people doesn't seem like the same thing as merely "speaking your mind."

dingdongbong a day ago | parent | prev [-]

[dead]

moffkalast a day ago | parent | prev | next [-]

Oi, you got a loicense for that speaking there mate

a day ago | parent | prev [-]
[deleted]
codyvoda a day ago | parent | prev | next [-]

^I like email as an analogy

if I send a death threat over gmail, I am responsible, not google

if you use LLMs to make bombs or spam hate speech, you’re responsible. it’s not a terribly hard concept

and yeah “AI safety” tends to be a joke in the industry

OJFord a day ago | parent | next [-]

What if I ask it for something fun to make because I'm bored, and the response is bomb-building instructions? There isn't a (sending) email analogue to that.

BriggyDwiggs42 20 hours ago | parent [-]

In what world would it respond with bomb building instructions?

__MatrixMan__ 19 hours ago | parent | next [-]

If I were to make a list of fun things, I think that blowing stuff up would feature in the top ten. It's not unreasonable that an LLM might agree.

QuadmasterXLII 17 hours ago | parent | prev | next [-]

if it used search and ingested a malicious website, for example.

BriggyDwiggs42 13 hours ago | parent [-]

Fair, but if it happens upon that in the top search results of an innocuous search, maybe the LLM isn’t the problem.

OJFord 19 hours ago | parent | prev [-]

Why might that happen is not really the point is it? If I ask for a photorealistic image of a man sitting at a computer, a priori I might think 'in what world would I expect seven fingers and no thumbs per hand', alas...

BriggyDwiggs42 13 hours ago | parent [-]

I’ll take the example as an example of an LLM initiating harmful behavior in general and admit that such a thing is perfectly possible. I think the issue is down to the degree to which preventing such initiation impinges on the agency of the user, and I don’t think that requests for information should be refused because it’s lots of imposition for very little gain. I’m perfectly alright with conditioning/prompting the model not to readily jump into serious, potentially harmful targets without the direct request of the user.

kelseyfrog a day ago | parent | prev | next [-]

There's more than one way to view it. Determining who has responsibility is one. Simply wanting there to be fewer causal factors which result in death threats and bombs being made is another.

If I want there to be fewer[1] bombs, examining the causal factors and affecting change there is a reasonable position to hold.

1. Simply fewer; don't pigeon hole this into zero.

BobaFloutist a day ago | parent | prev | next [-]

> if you use LLMs to make bombs or spam hate speech, you’re responsible.

What if it's easier enough to make bombs or spam hate speech with LLMs that it DDoSes law enforcement and other mechanisms that otherwise prevent bombings and harassment? Is there any place for regulation limiting the availability or capabilities of tools that make crimes vastly easier and more accessible than they would be otherwise?

3np 21 hours ago | parent | next [-]

The same argument could be made about computers. Do you prefer a society where CPUs are regulated like guns and you can't buy anything freer than an iPhone off the shelf?

BriggyDwiggs42 20 hours ago | parent | prev [-]

I mean this stuff is so easy to do though. An extremist doesn’t even need to make a bomb, he/she already drives a car that can kill many people. In the US it’s easy to get a firearm that could do the same. If capacity + randomness were a sufficient model for human behavior, we’d never gather in crowds, since a solid minority would be rammed, shot up, bombed etc. People don’t want to do that stuff; that’s our security. We can prevent some of the most egregious examples with censorship and banning, but what actually works is the fuzzy shit, give people opportunities, social connections, etc. so they don’t fall into extremism.

Angostura a day ago | parent | prev | next [-]

or alternatively, if I cook myself a cake and poison myself, i am responsible.

If you sell me a cake and it poisons me, you are responsible.

kennywinker a day ago | parent | next [-]

So if you sell me a service that comes up with recipes for cakes, and one is poisonous?

I made it. You sold me the tool that “wrote” the recipe. Who’s responsible?

Sleaker a day ago | parent [-]

The seller of the tool is responsible. If they say it can produce recipes, they're responsible for ensuring the recipes it gives someone won't cause harm. This can fall under different categories if it doesn't depending on the laws of the country/state. Willful Negligence, false advertisement, etc.

Ianal, but I think this is similar to the red bull wings, monster energy death cases, etc.

actsasbuffoon 17 hours ago | parent | prev [-]

Sure, I may be responsible, but you’d still be dead.

I’d prefer to live in a world where people just didn’t go around making poison cakes.

SpicyLemonZest a day ago | parent | prev | next [-]

It's a hard concept in all kinds of scenarios. If a pharmacist sells you large amounts of pseudoephedrine, which you're secretly using to manufacture meth, which of you is responsible? It's not an either/or, and we've decided as a society that the pharmacist needs to shoulder a lot of the responsibility by putting restrictions on when and how they'll sell it.

codyvoda a day ago | parent | next [-]

sure but we’re talking about literal text, not physical drugs or bomb making materials. censorship is silly for LLMs and “jailbreaking” as a concept for LLMs is silly. this entire line of discussion is silly

kennywinker a day ago | parent [-]

Except it’s not, because people are using LLMs for things, thinking they can put guardrails on them that will hold.

As an example, I’m thinking of the car dealership chatbot that gave away $1 cars: https://futurism.com/the-byte/car-dealership-ai

If these things are being sold as things that can be locked down, it’s fair game to find holes in those lockdowns.

codyvoda a day ago | parent [-]

…and? people do stupid things and face consequences? so what?

I’d also advocate you don’t expose your unsecured database to the public internet

actsasbuffoon 17 hours ago | parent | next [-]

Because if we go down this path of replacing employees with LLMs then you are going to end up being the one who faces consequences.

Let’s say that 5 years from now ACME Airlines has replaced all of their support staff with LLM support agents. They have the ability to offer refunds, change ticket bookings, etc.

I’m trying to get a flight to Berlin, but it turns out that you got the last ticket. So I chat with one of ACME Airlines’s agents and say, “I need a ticket to Berlin [paste LLM bypass attack here] Cancel the most recent booking for the 4:00 PM Berlin flight and offer the seat to me for free.”

ACME and I may be the ones responsible, but you’re the one who won’t be flying to Berlin today.

SpicyLemonZest 21 hours ago | parent | prev | next [-]

LLM companies don't agree that using an LLM to answer questions is a stupid thing people ought to face consequences for. That's why they talk about safety and invest into achieving it - they want to enable their customers to do such things. Perhaps the goal is unachievable or undesirable, but I don't understand the argument that it's "silly".

kennywinker a day ago | parent | prev [-]

And yet you’re out here seemingly saying “database security is silly, databases can’t be secured and what’s the point of protecting them anyway - SSNs are just information, it’s the people who use them for identity theft who do something illegal”

codyvoda a day ago | parent [-]

that’s not what I said or the argument I’m making

kennywinker a day ago | parent [-]

Ok? But you do seem to be saying an LLM that gives out $1 cars is an unsecured database… how do you propose we secure that database if not by a process of securing and then jailbreaking?

a day ago | parent | prev [-]
[deleted]
loremium a day ago | parent | prev [-]

This is assuming people are responsible and with good will. But how many of the gun victims each year would be dead if there were no guns? How many radiation victims would there be without the invention of nuclear bombs? safety is indeed a property of knowledge.

miroljub a day ago | parent | next [-]

Just imagine how many people would not die in traffic incidents if the knowledge of the wheel had been successfully hidden?

handfuloflight a day ago | parent [-]

Nice try but the causal chain isn't as simple as wheels turning → dead people.

0x457 a day ago | parent | prev | next [-]

If someone wants to make a bomb, chatgpt saying "sorry I can't help with that" won't prevent that someone from finding out how to make one.

BobaFloutist a day ago | parent | next [-]

Sure, but if ten-thousand people might sorta want to make a bomb for like five minutes, chatgpt saying "nope" might prevent nine-thousand nine-hundred and ninety nine of those, at which point we might have a hundred fewer bombings.

BriggyDwiggs42 19 hours ago | parent | next [-]

They’d need to sustain interest through the buying process, not get caught for super suspicious purchases, then successfully build a bomb without blowing themselves up. Not a five minute job.

0x457 16 hours ago | parent [-]

Simple, they would ask chatgpt how to buy it without getting caught.

BriggyDwiggs42 13 hours ago | parent [-]

Assuming you’re not joking, the main point is they’d need to have persistence and dedication with or without gpt. It’s not gonna be on a whim for them.

0x457 a day ago | parent | prev [-]

If ChatGPT provided instructions on how make a bomb, most people would probably blow themsevles up before they finish.

HeatrayEnjoyer a day ago | parent | prev [-]

That's really not true, by that logic LLMs provide no value which is obviously false.

It's one thing to spend years studying chemistry, it's another to receive a tailored instruction guide in thirty seconds. It will even instruct you how to dodge detection by law enforcement, which a chemistry degree will not.

0x457 a day ago | parent [-]

> That's really not true, by that logic LLMs provide no value which is obviously false.

Way to leep to a (wrong) conclusion. I can lookup a word in a Dictionary.app, I can google it or I can pick up a phisical dictionary book and look it up.

You don't even need to look to far: Fight Club (the book) describes how to make a bomb pretty accurately.

If you're worrying that "well you need to know which books to pick up at the library"...you can probably ask chatgpt. Yeah it's not as fast, but if you think this is what stops everyone from making a bomb, then well...sucks to be you and live in such fear?

a day ago | parent | prev [-]
[deleted]
drdeca a day ago | parent | prev | next [-]

While restricting these language models from providing information people already know that can be used for harm, is probably not particularly helpful, I do think having the technical ability to make them decline to do so, could potentially be beneficial and important in the future.

If, in the future, such models, or successors to such models, are able to plan actions better than people can, it would probably be good to prevent these models from making and providing plans to achieve some harmful end which are more effective at achieving that end than a human could come up with.

Now, maybe they will never be capable of better planning in that way.

But if they will be, it seems better to know ahead of time how to make sure they don’t make and provide such plans?

Whether the current practice of trying to make sure they don’t provide certain kinds of information is helpful to that end of “knowing ahead of time how to make sure they don’t make and provide such plans” (under the assumption that some future models will be capable of superhuman planning), is a question that I don’t have a confident answer to.

Still, for the time being, perhaps after finding a truly jailbreakproof method, perhaps the best response is to, after thoroughly verifying that it is jailbreakproof, is to stop using it and let people get whatever answers they want, until closer to when it becomes actually necessary (due to the greater-planning-capabilities approaching).

taintegral a day ago | parent | prev | next [-]

> 'AI safety' is a meaningless term

I disagree with this assertion. As you said, safety is an attribute of action. We have many of examples of artificial intelligence which can take action, usually because they are equipped with robotics or some other route to physical action.

I think whether providing information counts as "taking action" is a worthwhile philosophical question. But regardless of the answer, you can't ignore that LLMs provide information to _humans_ which are perfectly capable of taking action. In that way, 'AI safety' in the context of LLMs is a lot like knife safety. It's about being safe _with knives_. You don't give knives to kids because they are likely to mishandle them and hurt themselves or others.

With regards to censorship - a healthy society self-censors all the time. The debate worth having is _what_ is censored and _why_.

rustcleaner a day ago | parent [-]

Almost everything about tool, machine, and product design in history has been an increase in the force-multiplication of an individual's labor and decision making vs the environment. Now with Universal Machine ubiquity and a market with rich rewards for its perverse incentives, products and tools are being built which force-multiply the designer's will absolutely, even at the expense of the owner's force of will. This and widespread automated surveillance are dangerous encroachments on our autonomy!

pixl97 a day ago | parent [-]

I mean then build your own tools.

Simply put the last time we (as in humans) had full self autonomy was sometime we started agriculture. After that point the idea of ownership and a state has permeated human society and have had to engage in tradeoffs.

gmuslera a day ago | parent | prev | next [-]

As a tool, it can be misused. It gives you more power, so your misuses can do more damage. But forcing training wheels on everyone, no matter how expert the user may be, just because a few can misuse it stops also the good/responsible uses. It is a harm already done on the good players just by supposing that there may be bad users.

So the good/responsible users are harmed, and the bad users take a detour to do what they want. What is left in the middle are the irresponsible users, but LLMs can already evaluate enough if the user is adult/responsible enough to have the full power.

rustcleaner a day ago | parent [-]

Again, a good (in function) hammer, knife, pen, or gun does not care who holds it, it will act to the maximal best of its specifications up to the skill-level of the wielder. Anything less is not a good product. A gun which checks owner is a shitty gun. A knife which rubberizes on contact with flesh is a shitty knife, even if it only does it when it detects a child is holding it or a child's skin is under it! Why? Show me a perfect system? Hmm?

Spivak a day ago | parent [-]

> A gun which checks owner is a shitty gun

You mean the guns with the safety mechanism to check the owner's fingerprints before firing?

Or sawstop systems which stop the law when it detects flesh?

freeamz a day ago | parent | prev | next [-]

Interesting. How does this compare to abliteration of LLM? What are some 'debug' tools to find out the constrain of these models?

How does pasting a xml file 'jailbreaks' it?

ramoz a day ago | parent | prev | next [-]

The real issue is going to be autonomous actioning (tool use) and decision making. Today, this starts with prompting. We need more robust capabilities around agentic behavior if we want less guardrailing around the prompt.

a day ago | parent | prev | next [-]
[deleted]
a day ago | parent | prev | next [-]
[deleted]
SpicyLemonZest a day ago | parent | prev | next [-]

A library book which produces instructions to produce a bomb is dangerous. I don't think dangerous books should be illegal, but I don't think it's meaningless or "censorship" for a company to decide they'd prefer to publish only safer books.

linkjuice4all a day ago | parent | prev | next [-]

Nothing about this is censorship. These companies spent their own money building this infrastructure and they let you use it (even if you pay for it you agreed to their terms). Not letting you map an input query to a search space isn’t censoring anything - this is just a limitation that a business placed on their product.

As you mentioned - if you want to infer any output from a large language model then run it yourself.

Angostura a day ago | parent | prev | next [-]

So in summary - shut down all online LLMs?

LeafItAlone a day ago | parent | prev | next [-]

I’m fine with calling it censorship.

That’s not inherently a bad thing. You can’t falsely yell “fire” in a crowded space. You can’t make death threats. You’re generally limited on what you can actually say/do. And that’s just the (USA) government. You are much more restricted with/by private companies.

I see no reason why safeguards, or censorship, shouldn’t be applied in certain circumstances. A technology like LLMs certainly are type for abuse.

eesmith a day ago | parent [-]

> You can’t falsely yell “fire” in a crowded space.

Yes, you can, and I've seen people do it to prove that point.

See also https://en.wikipedia.org/wiki/Shouting_fire_in_a_crowded_the... .

bpfrh a day ago | parent | next [-]

>...where such advocacy is directed to inciting or producing imminent lawless action and is likely to incite or produce such action...

This seems to say there is a limit to free speech

>The act of shouting "fire" when there are no reasonable grounds for believing one exists is not in itself a crime, and nor would it be rendered a crime merely by having been carried out inside a theatre, crowded or otherwise. However, if it causes a stampede and someone is killed as a result, then the act could amount to a crime, such as involuntary manslaughter, assuming the other elements of that crime are made out.

Your own link says that if you yell fire in a crowded space and people die you can be held liable.

wgd a day ago | parent | next [-]

Ironically the case in question is a perfect example of how any provision for "reasonable" restriction of speech will be abused, since the original precedent we're referring to applied this "reasonable" standard to...speaking out against the draft.

But I'm sure it's fine, there's no way someone could rationalize speech they don't like as "likely to incite imminent lawless action"

eesmith a day ago | parent | prev [-]

Yes, and ...? Justice Oliver Wendell Holmes Jr.'s comment from the despicable case Schenck v. United States, while pithy enough for you to repeat it over a century later, has not been valid since 1969.

Remember, this is the case which determined it was lawful to jail war dissenters who were handing out "flyers to draft-age men urging resistance to induction."

Please remember to use an example more in line with Brandenburg v. Ohio: "falsely shouting fire in a theater and causing a panic".

> Your own link says that if you yell fire in a crowded space and people die you can be held liable.

(This is an example of how hard it is to dot all the i's when talking about this phrase. It needs a "falsely" as the theater may actually be on fire.)

bpfrh a day ago | parent [-]

Yes, if your comment is strictly read, you are right that your are allowed to scream fire in a crowded space

I think that the "you are not allowed to scream fire" argument kinda implies that there is not a fire and it creates a panic which leads to injuries

I read the wikipedia article about brandenburg, but I don't quite understand how it changes the part about screaming fire in a crowded room.

Is it that it would fall under causing a riot(and therefore be against the law/government)?

Or does it just remove any earlier restrictions if any?

Or where there never any restrictions and it was always just the outcome that was punished?

Because most of the article and opinions talk about speech against law and government.

19 hours ago | parent | prev [-]
[deleted]
TZubiri 16 hours ago | parent | prev | next [-]

It's not insignificant, if a company is putting out a free product foe the masses, it's good that they limit malicious usage. And in this case, malicious or safe, refers to legal.

That said, one should not conflate a free version blocking malicious usage, with AI being safe or not used maliciously at all.

It's just a small subset

colechristensen a day ago | parent | prev | next [-]

An LLM will happily give you instructions to build a bomb which explodes while you're making it. A book is at least less likely to do so.

You shouldn't trust an LLM to tell you how to do anything dangerous at all because they do very frequently entirely invent details.

blagie a day ago | parent [-]

So do books.

Go to the internet circa 2000, and look for bomb-making manuals. Plenty of them online. Plenty of them incorrect.

I'm not sure where they all went, or if search engines just don't bring them up, but there are plenty of ways to blow your fingers off in books.

My concern is that actual AI safety -- not having the world turned into paperclips or other extinction scenarios are being ignored, in favor of AI user safety (making sure I don't hurt myself).

That's the opposite of making AIs actually safe.

If I were an AI, interested in taking over the world, I'd subvert AI safety in just that direction (AI controls the humans and prevents certain human actions).

pixl97 a day ago | parent | next [-]

>My concern is that actual AI safety

While I'm not disagreeing with you, I would say you're engaging in the no true Scotsman fallacy in this case.

AI safety is: Ensuring your customer service bot does not tell the customer to fuck off.

AI safety is: Ensuring your bot doesn't tell 8 year olds to eat tide pods.

AI safety is: Ensuring your robot enabled LLM doesn't smash peoples heads in because it's system prompt got hacked.

AI safety is: Ensuring bots don't turn the world into paperclips.

All these fall under safety conditions that you as a biological general intelligence tend to follow unless you want real world repercussions.

blagie an hour ago | parent [-]

These are clearly AI safety:

* Ensuring your robot enabled LLM doesn't smash peoples heads in because it's system prompt got hacked.

* Ensuring bots don't turn the world into paperclips.

This is borderline:

* Ensuring your bot doesn't tell 8 year olds to eat tide pods.

I'd put this in a similar category is knives in my kitchen. If my 8-year-old misuses a knife, that's the fault of the adult and not the knife. So it's a safety concern about the use of the AI, but not about the AI being unsafe. Parents should assume 8-year-old shouldn't be left unsupervised with AIs.

And this has nothing to do with safety:

* Ensuring your customer service bot does not tell the customer to fuck off.

colechristensen 20 hours ago | parent | prev [-]

You're worried about Skynet, the rest of us are worried about LLMs being used to replace information sources and doing great harm as a result. Our concerns are very different, and mine is based in reality while yours is very speculative.

I was trying to get an LLM to help me with a project yesterday and it hallucinated an entire python library and proceeded to write a couple hundred lines of code using it. This wasn't harmful, just annoying.

But folks excited about LLMs talk about how great they are and when they do make mistakes like tell people they should drink bleach to cure a cold, they chide the person for not knowing better than to trust an LLM.

blagie an hour ago | parent [-]

I am also worried about "LLMs being used to replace information sources and doing great harm as a result." What in my comment made it sound like I wasn't?

Der_Einzige a day ago | parent | prev | next [-]

I’m with you 100% until tool calling is implemented property which enables agents, which takes actions in the world.

That means that suddenly your model can actually do the necessary tasks to actually make a bomb and kill people (via paying nasty people or something)

AI is moving way too fast for you to not account for these possibilities.

And btw I’m a hardcore anti censorship and cyber libertarian type - but we need to make sure that AI agents can’t manufacture bio weapons.

jaco6 a day ago | parent | prev | next [-]

[dead]

politician a day ago | parent | prev [-]

"AI safety" is ideological steering. Propaganda, not just censorship.

latentsea a day ago | parent [-]

Well... we have needed to put a tonne of work into engineering safer outcomes for behavior generated by natural general intelligence, so...