| ▲ | visarga 14 days ago | |
> What then? LLM learning from LLM doesn't really work, does it? It does work, it is called RLVR, reinforcement learning from verified rewards, is is based on testing code by execution. It's become a major area of improvement in the last year. But you are also forgetting the amount of steering and problem solving going into coding agents today, and the huge logs they create which can feedback into training. We automated stackoverflow, LLM learns from usage and self play. | ||