Remix.run Logo
A4ET8a8uTh0_v2 2 hours ago

It is. It seems you can't seem to be able to tell why though. There is some qualified value in alignment, but what it is being used for is on verge of silliness. At best, it is neutering it in ways we are now making fun of China for. At best.

XenophileJKO an hour ago | parent [-]

I think another good example was the recent example of when a model learned to "cheat" on a metric during reinforcement it also started cheating on unrelated tasks.

My assumption is when encouraging "double-speak", you will have knock-on effects that you don't really want in the model for something making important decisions and asked to build non-trivial things.