| ▲ | ninjagoo 18 hours ago | |
> Superior architectures will leak pretty quickly via engineers. I agree with the outcome of your premise (i.e., openness), but for different reasons: First, isn't it the case that these bleeding edge 'newfangled' LLMs are basically variations on the same core ideas from "Attention Is All You Need" from 2017? [1]. Different scale, but still the same basic architecture. Even the "MoE" innovation keeps the Transformer attention stack while replacing or augmenting the dense feed-forward/MLP part with routed expert blocks. And, I would argue that Engineers aren't working on new architectures. That would be Researchers, working on
That research is still open, so the outcome that you propose (openness) is likely to come to pass. Researchers/Scientists gotta publish, otherwise it's not science (to quote LeCun [2]) | ||