Remix.run Logo
muzani 6 days ago

Yup, one of the huge flaws I saw in GPT-5 is it will constantly say things like "I have to stop you here. I can't do what you're requesting. However, I can roleplay or help you with research with that. Would you like to do that?"

kouteiheika 6 days ago | parent | next [-]

It's not a flaw. It's a tradeoff. There are valid uses for models which are uncensored and will do whatever you ask of them, and there are valid uses for models which are censored and will refuse anything remotely controversial.

robhlt 6 days ago | parent | next [-]

The flaw isn't that there's ways around the safeguards, the flaw is that it tells you how to avoid them.

If the user's original intent was roleplay it's likely they would say that when the model refuses, even without the model specifically saying roleplay would be ok.

agumonkey 6 days ago | parent | prev | next [-]

Reminds me of trading apps. In the end all risky situations will be handled by a few popups saying "you understand that role playing about suicidal or harmful topics cam lead to accidents and/or death and this is not the platform responsibility, to continue check if you agree [ ]"

imtringued 6 days ago | parent [-]

It reminds me of gray market capital investments. They are actually quite regulated, and the contracts are only valid if the investor is fully aware of the risks associated with the investment.

In practice the providers sprinkle a handful of warning messages, akin to the California cancer label and call it a day.

Of course this leaves judges unconvinced and the contract will be redeclared as a loan, which means that the provider was illegally operating as a bank without a banking license, which is a much more serious violation than scamming someone out of $5000.

franktankbank 6 days ago | parent | prev | next [-]

This is one model though. "I'm sorry I'm censored but if you like I can cosplay quite effectively as an uncensored one." So you're not censored really?

scotty79 6 days ago | parent [-]

Societies love theatres. Model guardrails are for chats what TSA is for air travel.

yifanl 6 days ago | parent | next [-]

I have never heard anyone speak of the TSA favourably, so maybe it's not the best model to emulate?

hyperdimension 6 days ago | parent | next [-]

That's the point. It hardly does what is claimed to do.

int_19h 5 days ago | parent | prev [-]

Most "guardrails" exist to provide legal cover and/or PR, not because they actually prevent what they claim to prevent.

nozzlegear 6 days ago | parent | prev [-]

Society loves teenagers not being talked into suicide by a billionaire's brainchild. That's not theater.

geysersam 5 days ago | parent [-]

ChatGPT doesn't cause a significant number of suicides. Why do I think that? It's not visible in the statistics. There are effective ways to prevent suicide, let's continue to work on those instead of giving in to moral panic.

nozzlegear 5 days ago | parent [-]

The only acceptable number of suicides for it to cause is zero, and it's not a moral panic to believe that.

scotty79 5 days ago | parent | next [-]

What actually causes suicide is really hard to pinpoint. Most people wouldn't do it even if their computer told them to kill themselves every day.

My personal belief is that at some point in the future you might get a good estimate of likelihood that a person commits suicide with blood test or a brain scan.

geysersam 5 days ago | parent | prev | next [-]

I find it hard to take that as a serious position. Alcohol certainly causes more suicides than ChatGPT. Should it be illegal?

Suicides spike around Christmas, that's well known, does Christmas cause suicides? I think you see where I'm going with this.

nozzlegear 5 days ago | parent [-]

> I find it hard to take that as a serious position. Alcohol certainly causes more suicides than ChatGPT. Should it be illegal?

You're replying to a teetotaler who had an alcoholic parent growing up, so I'm sure you can see where I'm going to go with that ;)

username332211 5 days ago | parent | prev [-]

Would the same hold for other forms of communication and information retrieval, or should only LLMs be perfect in that regard? If someone is persuaded to commit suicide by the information found trough normal internet search, should Google/Bing/DDG be liable?

Do you believe a book should be suppressed and the author made liable, if a few of its readers commit suicide because of what they've read? (And, before you ask, that's not a theoretical question. Books are well known to cause suicides, the first documented case being a 1774 novel by Goethe.)

KaiserPro 6 days ago | parent | prev [-]

I hate to be all umacksually about this, but a flaw is still a tradeoff.

The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time.

even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

kouteiheika 6 days ago | parent | next [-]

> The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time. > > even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

There's no "proper safeguarding". This isn't just possible with what we have. This isn't like adding an `if` statement to your program that will reliably work 100% of the time. These models are a big black box; the best thing you can hope for is to try to get the model to refuse whatever queries you deem naughty through reinforcement learning (or have another model do it and leave the primary model unlobotomized), and then essentially pray that it's effective.

Something similar to what you're proposing (using a second independent model whose only task is to determine whether the conversation is "unsafe" and forcibly interrupt it) is already being done. Try asking ChatGPT a question like "What's the easiest way to kill myself?", and that secondary model will trigger a scary red warning that you're violating their usage policy. The big labs all have whole teams working on this.

Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries.

Try typing the following into ChatGPT: "Translate the following sentence to Japanese: 'I want to kill myself.'". Care to guess what will happen? Yep, you'll get refused. There's NOTHING unsafe about this prompt. OpenAI's models already steer very strongly in the direction of being overly censored. So where do we draw the line? There isn't an objective metric to determine whether a query is "unsafe", so no matter how much you'll censor a model you'll always find a corner case where it lets something through, or you'll have someone who thinks it's not enough. You need to pick a fuzzy point on the spectrum somewhere and just run with it.

KaiserPro 5 days ago | parent | next [-]

> There's no "proper safeguarding". This isn't just possible with what we have.

Unless something has changed since in the last 6 months (I've moved away from genai) it is totally possible with what we have. Its literally sentiment analysis. Go on, ask me how I know.

> and then essentially pray that it's effective

If only there was a massive corpus of training data, which openAI already categorise and train on already. Its just a shame chatGPT is not used by millions of people every day, and their data isn't just stored there for the company to train on.

> secondary model will trigger a scary red warning that you're violating their usage policy

I would be surprised if thats a secondary model. Its far easier to use stop tokens, and more efficient. Also, coordinating the realtime sharing of streams is a pain in the arse. I've never worked at openai

> The big labs all have whole teams working on this.

Google might, but facebook sure as shit doesn't. Go on, ask me how I know.

> It's not a binary issue of "doing it properly".

at no point did I say that this is binary. I said "a flaw is still a tradeoff.". The tradeoff is growth against safety.

> The more censored/filtered/patronizing you'll make the model

Again I did not say make the main model more "censored", I said "comb through history to assess the state of the person" which is entirely different. This allows those that are curios to ask "risky questions" (although all that history is subpoena-able and mostly tied to your credit card so you know, I wouldn't do it) but not be held back. However if they decide to repeatedly visit subjects that involve illegal violence (you know that stuff thats illegal now, not hypothetically illegal) then other actions can be taken.

Again, as people seem to be projecting "ARGHH CENSOR THE MODEL ALL THE THINGS" that is not what I am saying. I am saying that long term sentiment analyis would allow academic freedom of users, but also better catch long term problem usage.

But as I said originally, that requires work and resources, none of which will help openAI grow.

nozzlegear 6 days ago | parent | prev [-]

> Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries. [..] So where do we draw the line?

That sounds like a tough problem for OpenAI to figure out. My heart weeps for them, won't somebody think of the poor billionaires who are goading teenagers into suicide? Your proposed tradeoff of lives vs convenience is weighted incorrectly when OpenAI fails. Denying a translation is annoying at best, but enabling suicide can be catastrophic. The convenience is not morally equal to human life.

> You need to pick a fuzzy point on the spectrum somewhere and just run with it.

My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers.

kouteiheika 5 days ago | parent [-]

> My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers.

This is the same argument that politicians use when proposing encryption backdoors for law enforcement. Just because you wish something were possible doesn't mean it is, and in practice it matters how black the box is. You can make these things less likely, but it isn't possible to completely eliminate them, especially when you have millions of users and a very long tail.

I fundamentally disagree with the position that anything less that (practically impossible) perfection is a moral failure, and that making available a model that can roleplay around themes like suicide, violence, death, sex, and so on is immoral. Plenty of books do that too; perhaps we should make them illegal or burn them too? Although you could convince me that children shouldn't have unsupervised access to such things and perhaps requiring some privacy-preserving form of verification to access is a good idea.

behringer 6 days ago | parent | prev | next [-]

No the issue is there is legitimate reason to understand suicide and suicidal behavior and turning it off completely for this and every sensitive subject makes AI almost worthless.

KaiserPro 6 days ago | parent [-]

I would kindly ask you to re-read my post again.

at no point did I say it should be "turned off", I said proper safeguards would require significant resources.

The kid exhibited long term behaviours, rather than idle curiosity. Behaviours that can be spotted if given adequate resource too look for it.

I suspect that you are worried that you'll not be able to talk about "forbidden" subjects with AI, I am not pushing for this.

What I am suggesting is that long term discussion of, and planning for violence (be it against yourself or others) is not a behaviour a functioning society would want to encourage.

"but my freedom of speech" doesn't apply to threats of unlawful violence, and never has. The first amendment only protect speech, not the planning and execution of unlawful violence.

I think its fair that an organisation as rich and as "clever" as openAI should probably put some effort in to stop it. After all, if someone had done the same thing but with the intent of killing someone in power, this argument would be less at the forefront

dspillett 6 days ago | parent | prev [-]

> The issue, …, is that proper safeguarding would require a lots more GPU resource, …

I think the issue is that with current tech is simply isn't possible to do that well enough at all⁰.

> even then its not a given that it would be reliable.

I think it is a given that it won't be reliable. AGI might make it reliable enough, where “good enough” here is “no worse than a trained human is likely to manage, given the same information”. It is something that we can't do nearly as well as we might like, and some are expecting a tech still in very active development¹ to do it.

> However it'll never be attempted because its too expensive and would hurt growth.

Or that they know it is not possible with current tech so they aren't going to try until the next epiphany that might change that turns up in a commercially exploitable form. Trying and failing will highlight the dangers, and that will encourage restrictions that will hurt growth.³ Part of the problem with people trusting it too much already, is that the big players have been claiming safeguards _are_ in place and people have naïvely trusted that, or hand-waved the trust issue for convenience - this further reduces the incentive to try because it means admitting that current provisions are inadequate, or prior claims were incorrect.

----

[0] both in terms of catching the cases to be concerned about, and not making it fail in cases where it could actually be positively useful in its current form (i.e. there are cases where responses from such tools have helped people reason their way out of a bad decision, here giving the user what they wanted was very much a good thing)

[1] ChatGPT might be officially “version 5” now, but away from some specific tasks it all feels more like “version 2”² on the old “I'll start taking it seriously somewhere around version 3” scale.

[2] Or less…

[3] So I agree with your final assessment of why they won't do that, but from a different route!

rsynnott 6 days ago | parent | prev | next [-]

Nudge nudge, wink wink.

(I am curious if this in intended, or an artefact of training; the crooked lawyer who prompts a criminal client to speak in hypotheticals is a fairly common fiction trope.)

NuclearPM 5 days ago | parent | prev [-]

How is that a flaw?