| ▲ | franktankbank 12 hours ago | |||||||
Are you saying it wouldn't be able to converse using english of the time? | ||||||||
| ▲ | ben_w 12 hours ago | parent | next [-] | |||||||
Machine learning today requires an obscene quantity of examples to learn anything. SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then. Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months. When the data isn't there, reading what it does have really quickly isn't enough. | ||||||||
| ▲ | wasabi991011 12 hours ago | parent | prev [-] | |||||||
That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data | ||||||||
| ||||||||