Remix.run Logo
avianlyric 2 hours ago

Pace of data creation ignores the fact that the majority of the big gains in LLM “intelligence” has come from scraping and feeding in the huge amount of public data that already exists.

Unless we’re producing data on the order of an entire new internet every couple of years, then it’s hard to see how LLMs can achieve further huge leaps in capability compared to training on effectively 0% of the internet vs 100% of the internet.