Remix.run Logo
marcosdumay 3 days ago

> In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other

A person has some ~10k word vocabulary, with words fitting specific places in a really small set of rules. All combined, we probably have something on the order of a few million rules in a language.

What, yes, is larger than the thing in this paper can handle. But is nowhere near as large as a problem that should require something the size of a modern LLM to handle. So it's well worth it to try to enlarge models with other architectures, try hybrid models (note that this one is necessarily hybrid already), and explore every other possibility out there.