A very human thing to do is - not to tell us which model has failed like this! They are not all alike, some are, what I observe, order of magnitude better at this kind of stuff than others.

I believe how "neurotypical" (for the lack of a better word) you want model to be is a design choice. (But I also believe model traits such as sycophancy, some hallucinations or moral transgressions can be a side effect of training to be subservient. With humans it is similar, they tend to do these things when they are forced to perform.)

▲

nialse 9 hours ago | parent | next [-]

Codex in this case. I didn't even think about mentioning it. I'll update the post if it's actually relevant. Which I guess it is.

EDIT: It's specifically GPT-5.4 High in the Codex harness.

	▲	anuramat 8 hours ago \| parent \| next [-]
		weird, for me it was too un-human at first, taking everything literally even if it doesn't make sense; I started being more precise with prompting, to the point where it felt like "metaprogramming in english" claude on the other hand was exactly as described in the article
	▲	zingar 8 hours ago \| parent \| prev [-]
		Also the exact model/version if you haven't already.

▲

larrytheworm an hour ago | parent | prev [-]

[dead]