| ▲ | zwnow 3 days ago | ||||||||||||||||||||||||||||||||||||||||
I also love how AI enthusiasts just ignore the issue of exhausted training data... You cant just magically create more training data. Also synthetic training data reduces the quality of models. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | aspenmartin 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
Youre mixing up several concepts. Synthetic data works for coding because coding is a verifiable domain. You train via reinforcement learning to reward code generation behavior that passes detailed specs and meets other deseridata. It’s literally how things are done today and how progress gets made. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | TeMPOraL 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
They don't ignore it, they just know it's not an actual problem. It saddens me to see AI detractors being stuck in 2022 and still thinking language models are just regurgitating bits of training data. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | puchatek 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
That's been my main argument for why LLMs might be at their zenith. But I recently started wondering whether all those codebases we expose to them are maybe good enough training data for the next generation. It's not high quality like accepted stackoverflow answers but it's working software for the most part. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||