| ▲ | otabdeveloper4 3 hours ago | |
MoE and such are basically performance enhancements, they don't make the model smarter. | ||
| ▲ | yababa_y 3 hours ago | parent [-] | |
separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read... | ||