Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.

Human when preparing a CV: "Make my CV more professional"

LLM many days later presenting a report to HR: "This CV is really professional"

There's probably more to it than that of course.

But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.

▲

gzread 3 hours ago | parent [-]

And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.

	▲	mnicky 3 hours ago \| parent [-]
		Wasn't it this one? Article: https://alignment.anthropic.com/2025/subliminal-learning/ Paper: https://arxiv.org/abs/2507.14805