Remix.run Logo
visarga 7 hours ago

I think extraction from the model itself is a bad idea. But extraction from external sources, such as the deep research reports LLMs generate, or solving problems where we have validation of correctness is a good idea. The model is not validating its outputs by simply doing another inference, but consults external sources or gets feedback from code execution. Humans in chat rooms could also provide lots of learning signal, especially when actions are judged against the outcomes they cause down the line, using hindsight.

So in short what works is a model + a way to know its good outputs from bad ones.