| ▲ | dpoloncsak 4 hours ago | |
Do LLMs require that much more data than the tradional ML approaches we've seen over the years? | ||
| ▲ | sigmoid10 4 hours ago | parent | next [-] | |
Yes. This is pretty well established. Neural networks in general are considerably less sample-efficient than traditional ML methods. The reason they became so successful is that they scale better as you increase training data and model size. But only with modern compute power they became useful outside of academic toy model applications. | ||
| ▲ | Forgeties79 2 hours ago | parent | prev [-] | |
That’s not the issue I’m hitting here primarily but yes. My concern is that I can open up chatGPT and even with a free, “anonymous” account run an assembly line generating tens of thousands of words a day to pump to Twitter that are good enough to prop up multiple fake accounts and cause mayhem. Now make it thousands of people like me doing it. Now add funding and political orgs. Add company leadership that turns a blind eye so long as it drives engagement. This scale and pipeline wasn’t possible 5 years ago, even if we clearly see the throughline. I’m not even getting into fake images either. That used to require some know how. There are basically no hurdles and even if most people learn it’s fake, millions likely won’t. If you’re a little lucky, less scrupulous “news” outlets will amplify it for you as well for free. | ||