▲ | beeflet 2 days ago | |||||||
Maybe AI will become more ubiquitous. But I predict LLMs will be capped by the amount of training data present in the wild. | ||||||||
▲ | MountDoom a day ago | parent | next [-] | |||||||
Ubiquity doesn't depend on the AI getting much better as much as it depends on the computational cost going down (i.e., better hardware + software optimizations). When you can put a ChatGPT-class model locally on every desktop or phone, people will use it even if the accuracy or safety isn't quite there. Just look at how people are using Grok on Twitter, or how they're pasting ChatGPT output to win online arguments, or how they're trusting Google AI snippets. This is only gonna escalate. That said, this is probably not the future Sam Altman is talking about. His vision for the future must justify the sky-high valuations of OpenAI, and cheap ubiquity of this non-proprietary tech runs counter to that. So his "ubiquity" is some sort of special, qualified ubiquity that is 100% dependent on his company. | ||||||||
| ||||||||
▲ | nradov 2 days ago | parent | prev | next [-] | |||||||
That's why the competitive moat for frontier LLMs is access to proprietary training data. OpenAI and their competitors are paying fortunes to license private data sets, and in some cases even hiring human experts to write custom documents on specific topics as additional training data. This is how they hope to stay ahead of open-source alternatives. | ||||||||
▲ | kulahan 2 days ago | parent | prev | next [-] | |||||||
I think it'll be slightly different - without clearly marking AI-generated content, it'll be effectively impossible to find new content that isn't sold to you in pristine packages already, and even that you just sorta have to trust. Of course, you can't train LLMs on LLM-generated content. | ||||||||
▲ | bilbo0s 2 days ago | parent | prev [-] | |||||||
I'm more worried that publicly available LLMs "will be capped by the amount of training data present in the wild". But private LLMs, available only to the wealthy and powerful, will have additional, more pristine and accurate, data sources made available to them for training. Think about the legal field. The masses tend to use Google, whereas the wealthy and powerful all use LexisNexis. Who do you think has been winning in court? | ||||||||
|