Predicting When RL Training Breaks Chain-of-Thought Monitorability

	▲	Predicting When RL Training Breaks Chain-of-Thought Monitorability(lesswrong.com)
		1 points by gmays 9 hours ago