Thank you, I came here to say so much in less eloquent terms.

It's not surprising to find clustered sentiment from a slice of statistically correlated language. I wouldn't call this a "personality" any more than I would say the front grill of a car has a "face".

Deterministically isolating these clusters however, could prove to be an incredibly useful technique for both using and evaluating language models.

▲

D-Machine 3 hours ago | parent [-]

It's not even really the researchers' fault, academic psychological personality research is in general philosophically very weak / poor, in that they also almost always conflate "models of / talking about personality" with actual personality, and rarely actually check if things like the MBTI or Five-Factor Model actually correlate meaningfully with real behaviours.

Those that do find correlations between self-reported personality and actual behaviours tend to find those to be in a range of something like 0.0 to 0.3 or so, maybe 0.4 if you are really lucky. Which means "personality" measured this way is explaining something like 16% of the variance in behaviour, at max.

▲

devmor 3 hours ago | parent [-]

I don’t think this is even limited to this part of academia - or academia at all, but I do think it’s a bit irresponsible of them to assume prior rigor in those personality tests.

On top of that, a confounding issue is that human nature is to anthropomorphize things. What is more likely to be anthropomorphized than a construct of written language - the now primary method of knowledge transfer between humans? I can’t help but feel that this wishful bias contributes to missing the due diligence of choosing an appropriate metric with which to measure.

	▲	D-Machine 2 hours ago \| parent [-]
		Yup, I agree it is a general problem, and related to a tendency to over-anthropomorphize. At least in this case there was still something pretty good in the paper anyway.