Remix.run Logo
lordofgibbons 9 hours ago

We don't have a data scarcity problem. Further refinement to the pretraining stage will continue to happen, but I don't expect the orders of magnitude of additional scaling to be required any longer. What's lacking is RL datasets and environments.

If any more scaling scaling does happen, it will happen in the mid-training (using agentic/reasoning outputs from previous model versions) and RL training stages.

williamtrask 9 hours ago | parent [-]

I agree with you in a way - that it seems likely that new data will be incorproated in more inference-like ways. RAG is a little extreme... but i think there's going to be middle grounds betweeen full pre-training and RAG. Git-rebasin, MoE, etc.