Remix.run Logo
Terretta 2 days ago

Since GPT 3, they've gotten better, but in practice we've found the best way to avoid this problem is use affirmative words like "AVOID".

YES: AVOID using negations.

NO: DO NOT use negations.

Weirdly, I see the DO NOT (with caps) form in system prompts from the LLM vendors which is how we know they are hiring too fast.*

* Slight joke, it seems this is being heavily trained since 4.1-ish on OpenAI's side and since 3.5 on Anthropic's side. But "avoid" still works better.

Melatonic 2 days ago | parent [-]

I think you are really onto something here - I bet this would also reliably work when talking to humans. Maybe this is not even specifically the fault of the AI but just a language thing in general.

An alternative test could be prompting the AI with "Avoid not" and then give it some kind of instruction. Theoretically this would be telling it to "do" the instruction but maybe sometimes it would end up "avoiding" it?

Now that I think about it the training data itself might very well be contaminated with this contradiction.......

I can think of a lot of forum posts where the OP stipulates "I do not want X" and then the very first reply recommends "X" !