Remix clone Hacker News

new | show | ask | jobs Github

	▲	ninjagoo 18 hours ago
		> Superior architectures will leak pretty quickly via engineers. I agree with the outcome of your premise (i.e., openness), but for different reasons: First, isn't it the case that these bleeding edge 'newfangled' LLMs are basically variations on the same core ideas from "Attention Is All You Need" from 2017? [1]. Different scale, but still the same basic architecture. Even the "MoE" innovation keeps the Transformer attention stack while replacing or augmenting the dense feed-forward/MLP part with routed expert blocks. And, I would argue that Engineers aren't working on new architectures. That would be Researchers, working on `State-space models/Mamba (CMU/Princeton ecosystem), Diffusion Language Models (Inception Labs), Long-convolution architectures/Hyena (Stanford etc.), RWKV/Recurrent LLMs (open-source community), Memory-augmented architectures (Google Research/DeepMind?), World models/spatial intelligence (LeCun/Fei-Fei Li/DeepMind), Symbolic/neurosymbolic alternatives, Thousand brains (Numenta).` That research is still open, so the outcome that you propose (openness) is likely to come to pass. Researchers/Scientists gotta publish, otherwise it's not science (to quote LeCun [2]) [1] https://arxiv.org/abs/1706.03762 [2] https://x.com/ylecun/status/1795589846771147018