The idiom “X loves to Y” implies frequency, rather than agency. Would you object to someone saying “It loves to rain in Seattle”?

“Malicious compliance” is the act of following instructions in a way that is contrary to the intent. The word malicious is part of the term. Whether a thing is malicious by exercising malicious compliance is tangential to whether it has exercised malicious compliance.

That said, I have gotten good results with my addendum to my prompts to account for malicious compliance. I wonder if your comment Is due to some psychological need to avoid the appearance of personification of a machine. I further wonder if you are one of the people who are upset if I say “the machine is thinking” about a LLM still in prompt processing, but had no problems with “the machine is thinking” when waiting for a DOS machine to respond to a command in the 90s. This recent outrage over personifying machines since LLMs came onto the scene is several decades late considering that we have been personifying machines in our speech since the first electronic computers in the 1940s.

By the way, if you actually try what you suggested, you will find that the LLM will enter a Laurel and Hardy routine with you, where it will repeatedly make the mistake for you to correct. I have experienced this firsthand so many times that I have learned to preempt the behavior by telling the LLM not to maliciously comply at the beginning when I tell it what not to do.

▲

brookst 3 days ago | parent [-]

I work on consumer-facing LLM tools, and see A/B tests on prompting strategy daily.

YMMV on specifics but please consider the possibility that you may benefit from working on promoting and that not all behaviors you see are intrinsic to all LLMs and impossible to address with improved (usually simpler, clearer, shorter) prompts.

	▲	ryao 3 days ago \| parent [-]
		It sounds like you are used to short conversations with few turns. In conversations with dozens/hundreds/thousands of turns, prompting to avoid bad output entering the context is generally better than prompting to try to correct output after the fact. This is due to how in-context learning works, where the LLM will tend to regurgitate things from context. That said, every LLM has its quirks. For example, Gemini 1.5 Pro and related LLMs have a quirk where if you tolerate a single ellipsis in the output, the output will progressively gain ellipses until every few words is followed by an ellipsis and responses to prompts asking it to stop outputting ellipses includes ellipses anyway. :/