Remix.run Logo
franktankbank 12 hours ago

Are you saying it wouldn't be able to converse using english of the time?

ben_w 12 hours ago | parent | next [-]

Machine learning today requires an obscene quantity of examples to learn anything.

SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.

Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.

When the data isn't there, reading what it does have really quickly isn't enough.

wasabi991011 12 hours ago | parent | prev [-]

That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data

franktankbank 12 hours ago | parent [-]

Perhaps less bullshit though was my thought? Was language more restricted then? Scope of ideas?