> Is there a size cutoff you would say where diminishing returns really kick in?
No idea yet. But also it's obvious that making LLMs without MoE is stupid.