Like a lot of the research Anthropic has done, this and the “emergent misalignment” research they link to put more points in the “stochastic parrot” hypothesis column. The reason these LLM behaviors read as so weird to us is that we’re still anthropomorphizing the hell out of these systems - they can create very convincing dialogue, and the depth of the model suggests some surprising complexity, but the reason why, eg, a random string of numbers will induce changes elsewhere in the model is there’s simply nothing in the model to Be consistent. It is an extremely complex autocomplete algorithm that does a very effective cosplay of an “intelligent agent.”

My suspicion is that when we eventually find our way to AGI, these types of models will be a _component_ of those systems, but they lack some fundamental structuring that seems to be required to create anything like consistency or self-reflection.

(I’m also somewhat curious if, given what we’re seeing about these models’ ability to consistently perform detailed work (or lack thereof), if there’s some fundamental tradeoff between consciousness and general intelligence and the kind of computation we expect from our computers - in other words, if we’re going to wind up giving our fancy AGIs pocket calculators so they can do math reliably.)

▲

mitjam 7 days ago | parent | next [-]

> they lack some fundamental structuring that seems to be required to create anything like consistency or self-reflection

A valid observation. Interestingly, feeding the persona vectors detected during inference back into the context might be a novel way of self-reflection for LLMs.

	▲	roughly 7 days ago \| parent [-]
		Yeah, and this may be part of what the brain is doing - a referent check on our personal sense of identity to validate whether or not a response or action seems like the sort of thing we would do - “given that I’m this kind of person, is this the sort of thing I’d say?” (Noting that humans are, of course, not universally good at that kind of “identity” check either, or at least not universally good at letting it be guided by our “better natures”)

▲

gedy 7 days ago | parent | prev | next [-]

> My suspicion is that when we eventually find our way to AGI, these types of models will be a _component_ of those systems

I think this is a good summary of the situation, and strikes a balance between the breathless hype and the sneering comments about “AI slop“.

These technologies are amazing! And I do think they are facsimiles of parts of the human mind. (Image diffusion is certainly similar to human dreams in my opinion), but still feels like we are missing an overall intelligence or coordination in this tech for the present.

	▲	roughly 7 days ago \| parent \| next [-]
		I think this may also be why every discussion of the limitation of these models is met with a “well humans also hallucinate/whatever” - because we Do, but that’s often when some other part of the controlling mechanism has broken down. Psylocibin induces hallucinations by impairing the brain’s ability to ignore network outputs, and Kahneman and Tversky’s work on cognitive biases centers the unchecked outputs of autonomous networks in the brain - in both cases, it’s the failure or bypass of the central regulatory network that induces failure cases that look like what we see in LLMs.
	▲	weitendorf 7 days ago \| parent \| prev [-]
		The bitterest lesson is we want slop (or, "slop is all you need") Maybe you can recognize that someone else loves a certain kind of slop, but if LLMs became vastly more intelligent and capable, wouldn't it better for it to interact with you on your level too, rather than at a much higher level that you wouldn't understand? If you used it to make you a game or entertain you with stories, isn't that just your own preferred kind of slop? If we automate all the practical stuff away then what is left but slop?

▲

7 days ago | parent | prev [-]

[deleted]