| ▲ | zkmon 10 hours ago | |
I'm guessing 3.5-27b would beat 3.6-35b. MoE is a bad idea. Because for the same VRAM 27b would leave a lot more room, and the quality of work directly depends on context size, not just the "B" number. | ||
| ▲ | zozbot234 9 hours ago | parent | next [-] | |
MoE is not a bad idea for local inference if you have fast storage to offload to, and this is quickly becoming feasible with PCIe 5.0 interconnect. | ||
| ▲ | perbu 8 hours ago | parent | prev [-] | |
MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast. | ||