Remix.run Logo
lambda 4 hours ago

What do you mean it's unfit for training? It's a form of reinforcement learning; the end result has been selected based on whether it actually solved the need.

You need to be careful of the amount of reinforcement learning vs continued pretraining you do, but they already do plenty of other forms of reinforcement learning, I'm sure they have it dialed in.