▲ | rmuratov 4 days ago | |
How did we use "all the data"? New knowledge appears on the internet every day, new scientific articles and videos are published. | ||
▲ | lend000 4 days ago | parent [-] | |
At the speeds AI is moving, we've effectively used it all; the high quality data you need to make smarter models is coming in at a trickle. We're not getting 10^5 Principia Mathematicas published every day. Maybe I just don't have the vision to understand it, but it seems like AI-generated synthetic data for training shouldn't be able to make a smarter model than whatever produced that data. I can imagine synthetic data would be useful for making models more efficient (that's what quantized models are, after all), but not pushing the frontier. |