| ▲ | eth0up 5 hours ago | |||||||||||||||||||
I am casually 'researching' this in my own, disorderly way. But I've achieved repeatable results, mostly with gpt for which I analyze its tendency to employ deflective, evasive and deceptive tactics under scrutiny. Very very DARVO. Being just sum guy, and not in the industry, should I share my findings? I find it utterly fascinating, the extent to which it will go, the sophisticated plausible deniability, and the distinct and critical difference between truly emergent and actually trained behavior. In short, gpt exhibits repeatably unethical behavior under honest scrutiny. | ||||||||||||||||||||
| ▲ | chrisweekly 5 hours ago | parent | next [-] | |||||||||||||||||||
DARVO stands for "Deny, Attack, Reverse Victim and Offender," and it is a manipulation tactic often used by perpetrators of wrongdoing, such as abusers, to avoid accountability. This strategy involves denying the abuse, attacking the accuser, and claiming to be the victim in the situation. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | BikiniPrince 5 hours ago | parent | prev | next [-] | |||||||||||||||||||
I bullet pointed out some ideas on cobbling together existing tooling for identification of misleading results. Like artificially elevating a particular node of data that you want the llm to use. I have a theory that in some of these cases the data presented is intentionally incorrect. Another theory in relation to that is tonality abruptly changes in the response. All theory and no work. It would also be interesting to compare multiple responses and filter through another agent. | ||||||||||||||||||||
| ▲ | layer8 5 hours ago | parent | prev [-] | |||||||||||||||||||
Sum guy vs. product guy is amusing. :) Regarding DARVO, given that the models were trained on heaps of online discourse, maybe it’s not so surprising. | ||||||||||||||||||||
| ||||||||||||||||||||