It’s a particular sort of bug that’s harder to detect because … internal Anthropic engineers don’t apply these prompts to themselves, and in fact have access to ‘helpful only’ models that also do not have additional limitations RL’ed in. (Or perhaps they’re RL’ed out - not sure of current training mechanisms.)

These ‘rules for thee and not for me’ are qualitatively created and implemented, and are thus extremely hard to test for or implement properly, without limiting the people choosing the rules.

▲

QuercusMax 6 hours ago | parent [-]

They must have some sort of smoke tests for common operations, run in a test harness with the system prompts they force on users, right?

....Right?

What kind of Mickey mouse operation are they running over there?

	▲	subscribed an hour ago \| parent [-]
		I wouldn't bet a chocolate chip cookie on that.