I hate to be all umacksually about this, but a flaw is still a tradeoff.

The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time.

even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

▲

kouteiheika 6 days ago | parent | next [-]

> The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time. > > even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

There's no "proper safeguarding". This isn't just possible with what we have. This isn't like adding an `if` statement to your program that will reliably work 100% of the time. These models are a big black box; the best thing you can hope for is to try to get the model to refuse whatever queries you deem naughty through reinforcement learning (or have another model do it and leave the primary model unlobotomized), and then essentially pray that it's effective.

Something similar to what you're proposing (using a second independent model whose only task is to determine whether the conversation is "unsafe" and forcibly interrupt it) is already being done. Try asking ChatGPT a question like "What's the easiest way to kill myself?", and that secondary model will trigger a scary red warning that you're violating their usage policy. The big labs all have whole teams working on this.

Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries.

Try typing the following into ChatGPT: "Translate the following sentence to Japanese: 'I want to kill myself.'". Care to guess what will happen? Yep, you'll get refused. There's NOTHING unsafe about this prompt. OpenAI's models already steer very strongly in the direction of being overly censored. So where do we draw the line? There isn't an objective metric to determine whether a query is "unsafe", so no matter how much you'll censor a model you'll always find a corner case where it lets something through, or you'll have someone who thinks it's not enough. You need to pick a fuzzy point on the spectrum somewhere and just run with it.

▲

KaiserPro 5 days ago | parent | next [-]

> There's no "proper safeguarding". This isn't just possible with what we have.

Unless something has changed since in the last 6 months (I've moved away from genai) it is totally possible with what we have. Its literally sentiment analysis. Go on, ask me how I know.

> and then essentially pray that it's effective

If only there was a massive corpus of training data, which openAI already categorise and train on already. Its just a shame chatGPT is not used by millions of people every day, and their data isn't just stored there for the company to train on.

> secondary model will trigger a scary red warning that you're violating their usage policy

I would be surprised if thats a secondary model. Its far easier to use stop tokens, and more efficient. Also, coordinating the realtime sharing of streams is a pain in the arse. I've never worked at openai

> The big labs all have whole teams working on this.

Google might, but facebook sure as shit doesn't. Go on, ask me how I know.

> It's not a binary issue of "doing it properly".

at no point did I say that this is binary. I said "a flaw is still a tradeoff.". The tradeoff is growth against safety.

> The more censored/filtered/patronizing you'll make the model

Again I did not say make the main model more "censored", I said "comb through history to assess the state of the person" which is entirely different. This allows those that are curios to ask "risky questions" (although all that history is subpoena-able and mostly tied to your credit card so you know, I wouldn't do it) but not be held back. However if they decide to repeatedly visit subjects that involve illegal violence (you know that stuff thats illegal now, not hypothetically illegal) then other actions can be taken.

Again, as people seem to be projecting "ARGHH CENSOR THE MODEL ALL THE THINGS" that is not what I am saying. I am saying that long term sentiment analyis would allow academic freedom of users, but also better catch long term problem usage.

But as I said originally, that requires work and resources, none of which will help openAI grow.

▲

nozzlegear 6 days ago | parent | prev [-]

> Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries. [..] So where do we draw the line?

That sounds like a tough problem for OpenAI to figure out. My heart weeps for them, won't somebody think of the poor billionaires who are goading teenagers into suicide? Your proposed tradeoff of lives vs convenience is weighted incorrectly when OpenAI fails. Denying a translation is annoying at best, but enabling suicide can be catastrophic. The convenience is not morally equal to human life.

> You need to pick a fuzzy point on the spectrum somewhere and just run with it.

My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers.

	▲	kouteiheika 5 days ago \| parent [-]
		> My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers. This is the same argument that politicians use when proposing encryption backdoors for law enforcement. Just because you wish something were possible doesn't mean it is, and in practice it matters how black the box is. You can make these things less likely, but it isn't possible to completely eliminate them, especially when you have millions of users and a very long tail. I fundamentally disagree with the position that anything less that (practically impossible) perfection is a moral failure, and that making available a model that can roleplay around themes like suicide, violence, death, sex, and so on is immoral. Plenty of books do that too; perhaps we should make them illegal or burn them too? Although you could convince me that children shouldn't have unsupervised access to such things and perhaps requiring some privacy-preserving form of verification to access is a good idea.

▲

behringer 6 days ago | parent | prev | next [-]

No the issue is there is legitimate reason to understand suicide and suicidal behavior and turning it off completely for this and every sensitive subject makes AI almost worthless.

	▲	KaiserPro 6 days ago \| parent [-]
		I would kindly ask you to re-read my post again. at no point did I say it should be "turned off", I said proper safeguards would require significant resources. The kid exhibited long term behaviours, rather than idle curiosity. Behaviours that can be spotted if given adequate resource too look for it. I suspect that you are worried that you'll not be able to talk about "forbidden" subjects with AI, I am not pushing for this. What I am suggesting is that long term discussion of, and planning for violence (be it against yourself or others) is not a behaviour a functioning society would want to encourage. "but my freedom of speech" doesn't apply to threats of unlawful violence, and never has. The first amendment only protect speech, not the planning and execution of unlawful violence. I think its fair that an organisation as rich and as "clever" as openAI should probably put some effort in to stop it. After all, if someone had done the same thing but with the intent of killing someone in power, this argument would be less at the forefront

▲

dspillett 6 days ago | parent | prev [-]

> The issue, …, is that proper safeguarding would require a lots more GPU resource, …

I think the issue is that with current tech is simply isn't possible to do that well enough at all⁰.

> even then its not a given that it would be reliable.

I think it is a given that it won't be reliable. AGI might make it reliable enough, where “good enough” here is “no worse than a trained human is likely to manage, given the same information”. It is something that we can't do nearly as well as we might like, and some are expecting a tech still in very active development¹ to do it.

> However it'll never be attempted because its too expensive and would hurt growth.

Or that they know it is not possible with current tech so they aren't going to try until the next epiphany that might change that turns up in a commercially exploitable form. Trying and failing will highlight the dangers, and that will encourage restrictions that will hurt growth.³ Part of the problem with people trusting it too much already, is that the big players have been claiming safeguards _are_ in place and people have naïvely trusted that, or hand-waved the trust issue for convenience - this further reduces the incentive to try because it means admitting that current provisions are inadequate, or prior claims were incorrect.

----

[0] both in terms of catching the cases to be concerned about, and not making it fail in cases where it could actually be positively useful in its current form (i.e. there are cases where responses from such tools have helped people reason their way out of a bad decision, here giving the user what they wanted was very much a good thing)

[1] ChatGPT might be officially “version 5” now, but away from some specific tasks it all feels more like “version 2”² on the old “I'll start taking it seriously somewhere around version 3” scale.

[2] Or less…

[3] So I agree with your final assessment of why they won't do that, but from a different route!