▲ | zozbot234 3 days ago | ||||||||||||||||||||||||||||
> the biggest take away I have is, if you tell it "don't do xyz" it will always have in the back of its mind "do xyz" and any chance it gets it will take to "do xyz" You're absolutely right! This can actually extend even to things like safety guardrails. If you tell or even train an AI to not be Mecha-Hitler, you're indirectly raising the probability that it might sometimes go Mecha-Hitler. It's one of many reasons why genuine "alignment" is considered a very hard problem. | |||||||||||||||||||||||||||||
▲ | jonfw 3 days ago | parent | next [-] | ||||||||||||||||||||||||||||
This reminds me of a phenomena in motorcyling called "target fixation". If you are looking at something, you are more likely to steer towards it. So it's a bad idea to focus on things you don't want to hit. The best approach is to pick a target line and keep the target line in focus at all times. I had never realized that AIs tend to have this same problem, but I can see it now that it's been mentioned! I have in the past had to open new context windows to break out of these cycles. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | elcritch 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Given how LLMs work it makes sense that mentioning a topic even to negate it still adds that locus of probabilities to its attention span. Even humans are prone to being affected by it as it's a well known rhetorical device [1]. Then any time the probability chains for some command approaches that locus it'll fall into it. Very much like chaotic attractors come to think of it. Makes me wonder if there's any research out there on chaos theory attractors and LLM thought patterns. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | aquova 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
> You're absolutely right! Claude? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | taway1a2b3c 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||
> You're absolutely right! Is this irony, actual LLM output or another example of humans adopting LLM communication patterns? | |||||||||||||||||||||||||||||
|