Remix.run Logo
djeastm 4 days ago

I thought reinforcement learning with human feedback was meant to get that quantification of "taste"