Remix.run Logo
duskwuff 9 hours ago

IIRC, it's well documented that negative instructions tend to be ineffective - possibly through some sort of LLM analogue to the "pink elephant paradox", or simply because the language models are unable to recognize clichés until they've already been generated.

esperent 7 hours ago | parent [-]

That was definitely true with early LLMs but I don't know if that's still the case. Certainly not as strong as it used to be. I think now most negative instructions are followed quite well but there's still a few things that must be deeply embedded from pretaining that are harder to avoid - these specific annoying phrasings, for example.