| ▲ | sosodev a day ago | |||||||
That idea is called model collapse https://en.wikipedia.org/wiki/Model_collapse Some studies have shown that direct feedback loops do cause collapse but many researchers argue that it’s not a risk with real world data scales. In fact, a lot of advancements in the open weight model space recently have been due to training on synthetic data. At least 33% of the data used to train nvidia’s recent nemotron 3 nano model was synthetic. They use it as a way to get high quality agent capabilities without doing tons of manual work. | ||||||||
| ▲ | ehnto 21 hours ago | parent | next [-] | |||||||
That's not quite the same thing I think, the risk here is that the sources of training information vanishes as well, not necessarily the feedback loop aspect. For example all the information on the web could be said to be a distillation of human experiences, and often it ended up online due to discussions happening during problem solving. Questions were asked of the humans and they answered with their knowledge from the real world and years of experience. If no one asks humans anymore, they just ask LLMs, then no new discussions between humans are occurring online and that experience doesn't get syndicated in a way models can train on. That is essentially the entirety of Stack Overflows existence until now. You can pretty strongly predict that no new software experience will be put into Stack Overflow from now. So what of new programming languages or technologies and all the nuances within them? Docs never have all the answers, so models will simply lack the nuanced information. | ||||||||
| ||||||||
| ▲ | bandrami 14 hours ago | parent | prev | next [-] | |||||||
The Habsburgs thought it wouldn't be a problem either | ||||||||
| ▲ | sethops1 a day ago | parent | prev [-] | |||||||
Can't help but wonder if that's a strategy that works until it doesn't. | ||||||||