| ▲ | christina97 2 hours ago | |
It seems worse in all aspects to the 26B A4B? I would have thought dense models beat MoE still on many benchmarks? Is the entire point of this model then that it runs if you don’t have enough GPU memory to load the 26B? That one runs faster anyway due to lower active params. | ||