▲ | carlita_express 3 days ago | |||||||||||||||||||||||||
> we’ve observed that large-scale reinforcement learning exhibits the same “more compute = better performance” trend observed in GPT‑series pretraining. Didn’t the pivot to RL from pretraining happen because the scaling “law” didn’t deliver the expected gains? (Or at least because O(log) increases in model performance became unreasonably costly?) I see they’ve finally resigned themselves to calling these trends, not laws, but trends are often fleeting. Why should we expect this one to hold for much longer? | ||||||||||||||||||||||||||
▲ | anothermathbozo 3 days ago | parent | next [-] | |||||||||||||||||||||||||
This isn't exactly the case. The trend is a log scale. So a 10x in pretraining should yield a 10% increase in performance. That's not proving to be false per say but rather they are encountering practical limitations around 10x'ing data volume and 10x'ing available compute. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | og_kalu 3 days ago | parent | prev [-] | |||||||||||||||||||||||||
It doesn't need to hold forever or even 'much longer' depending on your definition of that duration. It just needs to hold on long enough to realize certain capabilities. Will it ? Who knows. But seeing as this is something you can't predict ahead of time, it makes little sense not to try in so far as the whole thing is still feasible. |