| ▲ | teaearlgraycold 2 hours ago | |
Qwen3.5 35B A3B is much much faster and fits if you get a 3 bit version. How fast are you getting 27B to run? On my M3 Air w/ 24GB of memory 27B is 2 tok/s but 35B A3B is 14-22 tok/s which is actually usable. | ||
| ▲ | throwdbaaway 33 minutes ago | parent | next [-] | |
Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context. 35B A3B is faster but didn't do too well in my limited testing. | ||
| ▲ | ece 2 hours ago | parent | prev [-] | |
The 27B is rated slightly higher for SWE-bench. | ||