Remix.run Logo
ninjagoo 18 hours ago

> Superior architectures will leak pretty quickly via engineers.

I agree with the outcome of your premise (i.e., openness), but for different reasons:

First, isn't it the case that these bleeding edge 'newfangled' LLMs are basically variations on the same core ideas from "Attention Is All You Need" from 2017? [1]. Different scale, but still the same basic architecture. Even the "MoE" innovation keeps the Transformer attention stack while replacing or augmenting the dense feed-forward/MLP part with routed expert blocks.

And, I would argue that Engineers aren't working on new architectures. That would be Researchers, working on

  State-space models/Mamba (CMU/Princeton ecosystem), 
  Diffusion Language Models (Inception Labs), 
  Long-convolution architectures/Hyena (Stanford etc.), 
  RWKV/Recurrent LLMs (open-source community), 
  Memory-augmented architectures (Google Research/DeepMind?), 
  World models/spatial intelligence (LeCun/Fei-Fei Li/DeepMind), 
  Symbolic/neurosymbolic alternatives, 
  Thousand brains (Numenta).
That research is still open, so the outcome that you propose (openness) is likely to come to pass. Researchers/Scientists gotta publish, otherwise it's not science (to quote LeCun [2])

[1] https://arxiv.org/abs/1706.03762

[2] https://x.com/ylecun/status/1795589846771147018