Remix.run Logo
XenophileJKO an hour ago

I think another good example was the recent example of when a model learned to "cheat" on a metric during reinforcement it also started cheating on unrelated tasks.

My assumption is when encouraging "double-speak", you will have knock-on effects that you don't really want in the model for something making important decisions and asked to build non-trivial things.