Remix.run Logo
KaiserPro 2 hours ago

wait, is this news?

Of course they reflect the bias in the training, thats been known since the 90s if not longer (see apocryphal story about training to detect tanks, but only detecting either trees or clouds)

but like this is expected, the whole point of RLHF (or any other feedback) is to condition the model to respond in a certain way. Thats what makes them useable for a bunch of situations.