Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

That is completely different from the models spying on the users, which is what is discussed here.

	▲	Xmd5a 6 days ago \| parent [-]
		as a vector. Train the model to start injecting backdoors past a certain date. >Simple probes can catch sleeper agents https://www.anthropic.com/research/probes-catch-sleeper-agen...