| ▲ | gzread 3 hours ago | |
And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating. | ||
| ▲ | mnicky 3 hours ago | parent [-] | |
Wasn't it this one? Article: https://alignment.anthropic.com/2025/subliminal-learning/ | ||