Remix.run Logo
ai_slop_hater 3 hours ago

No they are clearly not just scaled up versions of gpt 2; there are different LLM architectures like mixture of experts etc that appeared relatively recently. I am not an expert though, far from it.

otabdeveloper4 3 hours ago | parent [-]

MoE and such are basically performance enhancements, they don't make the model smarter.

yababa_y 3 hours ago | parent [-]

separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read...