| ▲ | CamperBob2 3 hours ago | |
Try the 27B dense model. It will likely do much better than the 35b MoE with only 3B active experts. Also, performance on research-y questions isn't always a good indicator of how the model will do for code generation or agent orchestration. | ||