Remix.run Logo
benashford 4 hours ago

Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.

Human when preparing a CV: "Make my CV more professional"

LLM many days later presenting a report to HR: "This CV is really professional"

There's probably more to it than that of course.

But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.

gzread 3 hours ago | parent [-]

And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.

mnicky 3 hours ago | parent [-]

Wasn't it this one?

Article: https://alignment.anthropic.com/2025/subliminal-learning/

Paper: https://arxiv.org/abs/2507.14805