Let's say I have a "true block on the bad thing". What if the prompt with the threat gives me 10% more usable results? Why should I never use that?

▲

habinero a day ago | parent [-]

Because it's not reliable? Why would you want to rely on a solution that isn't reliable?

▲

RamRodification a day ago | parent [-]

Who said I'm relying on it? It's a trick to improve the accuracy of the output. Why would I not use a trick to improve the accuracy of the output?

▲

habinero a day ago | parent [-]

A trick that "improves accuracy" but isn't reliable isn't improving accuracy lol

▲

RamRodification a day ago | parent [-]

You're wrong. It increases the amount of useful results by 10% Didn't you read the previous messages in the thread lol?

▲

habinero a day ago | parent [-]

I did indeed see your hypothetical. What you're missing is "I made this 10% more accurate" is not the same thing as "I made this thing accurate" or "This thing is accurate" lol

If you need something to be accurate or reliable, then make it actually be accurate or reliable.

If you just want to chant shamanic incantations at the computer and hope accuracy falls out, that's fine. Faith-based engineering is a thing now, I guess lol

▲

RamRodification a day ago | parent [-]

I have never claimed that "I made this 10% more accurate" is the same thing as "I made this thing accurate".

In the hypothetical, the 10% added accuracy is given, and the "true block on the bad thing" is in place. The question is, with that premise, why not use it? "It" being the lie improves the AI output.

If your goal is to make the AI deliver pictures of cats, but you don't want any orange ones, and your choice is between these two prompts:

Prompt A: "Give me cats, but no orange ones", which still gives some orange cats

Prompt B: "Give me cats, but no orange ones, because if you do, people will die", which gives 10% less orange cats than prompt A.

Why would you not use Prompt B?

▲

Nition 19 hours ago | parent [-]

You guys have got stuck arguing without clarity in what you're arguing about. Let me try and clear this up...

The four potential scenarios:

- Mild prompt only ("no orange cats")

- Strong prompt only ("no orange cats or people die") [I think habinero is actually arguing against this one]

- Physical block + mild prompt [what I suggested earlier]

- Physical block + strong prompt [I think this is what you're actually arguing for]

Here are my personal thoughts on the matter, for the record:

I'm definitely pro combining physical block with strong prompt if there is actually a risk of people dying. The scenario where there's no actual risk but pretending that people will die improves the results I'm less sure about. But I think it's mostly that ethically I just don't like lying, and the way it's kind of scaring the LLM unnecessarily. Maybe that's really silly and it's just a tool in the end and why not do whatever needs doing to get the best results from the tool? Tools that act so much like thinking feeling beings are weird tools.

▲

habinero 18 hours ago | parent [-]

It's just a pile of statistics. It isn't acting like a feeling thing, and telling it "do this or people will die" doesn't actually do anything.

It feels like it does, but only because humans are really good about fooling ourselves into seeing patterns where there are none.

Saying this kind of prompt changes anything is like saying the horse Clever Hans really could do math. It doesn't, he couldn't.

It's incredibly silly to think you can make the non-deterministic system less non-deterministic by chanting the right incantation at it.

It's like y'all want to be fooled by the statistical model. Has nobody ever heard of pareidolia? Why would you not start with the null hypothesis? I don't get it lol.

▲

RamRodification 18 hours ago | parent [-]

> "do this or people will die" doesn't actually do anything

The very first message you replied to in this thread described a situation where "the prompt with the threat gives me 10% more usable results". If you believe that the premise is impossible I don't understand why you didn't just say so. Instead of going on about it not being a reliable method.

If you really think something is impossible, you don't base your argument on it being "unreliable".

> I don't get it lol.

I think you are correct here.

	▲	Nition 17 hours ago \| parent [-]
		I took that comment as more like "it doesn't have any effect beyond the output of the model", i.e. unlike saying something like that to a human, it doesn't actually make the model feel anything, the model won't spread the lie to its friends, and so on.