| ▲ | mhitza 4 hours ago | |
> The finding I did not expect: model quality matters more than token speed for agentic coding. I'm really surprised how that was not obvious. Also, instead of limiting context size to something like 32k, at the cost of ~halving token generation speed, you can offload MoE stuff to the CPU with --cpu-moe. | ||