I am casually 'researching' this in my own, disorderly way. But I've achieved repeatable results, mostly with gpt for which I analyze its tendency to employ deflective, evasive and deceptive tactics under scrutiny. Very very DARVO.

Being just sum guy, and not in the industry, should I share my findings?

I find it utterly fascinating, the extent to which it will go, the sophisticated plausible deniability, and the distinct and critical difference between truly emergent and actually trained behavior.

In short, gpt exhibits repeatably unethical behavior under honest scrutiny.

▲

chrisweekly 5 hours ago | parent | next [-]

DARVO stands for "Deny, Attack, Reverse Victim and Offender," and it is a manipulation tactic often used by perpetrators of wrongdoing, such as abusers, to avoid accountability. This strategy involves denying the abuse, attacking the accuser, and claiming to be the victim in the situation.

	▲	Pearse 3 hours ago \| parent \| next [-]
		Thanks for the context
	▲	SkyBelow 4 hours ago \| parent \| prev \| next [-]
		Isn't this also the tactic used by someone who has been falsely accused? If one is innocent, should they not deny it or accuse anyone claiming it was them of being incorrect? Are they not a victim? I don't know, it feels a bit like a more advanced version of the kafka trap of "if you have nothing to hide, you have nothing to fear" to paint normal reactions as a sign of guilt.
	▲	eth0up 5 hours ago \| parent \| prev [-]
		Exactly. And I have hundreds of examples of just that. Hence my fascination, awe and terror.....

▲

BikiniPrince 5 hours ago | parent | prev | next [-]

I bullet pointed out some ideas on cobbling together existing tooling for identification of misleading results. Like artificially elevating a particular node of data that you want the llm to use. I have a theory that in some of these cases the data presented is intentionally incorrect. Another theory in relation to that is tonality abruptly changes in the response. All theory and no work. It would also be interesting to compare multiple responses and filter through another agent.

▲

layer8 5 hours ago | parent | prev [-]

Sum guy vs. product guy is amusing. :)

Regarding DARVO, given that the models were trained on heaps of online discourse, maybe it’s not so surprising.

	▲	eth0up 4 hours ago \| parent [-]
		Meta awareness, repeatability, and much more strongly indicates this is deliberate training... in my perspective. It's not emergent. If it was, I'd be buggering off right now. Big big difference.