| ▲ | francisduvivier 3 hours ago | |
I think mainly that he can move much faster with specific improvements targeting Deepseek on Systems with unified memory (Mac or Strix). It's a lot easier to optimize if you don't need to worry about all the other architectures. So optimize he did and it's just a lot faster than llama cpp for deepseek v4 pro and flash. Also interesting features are more doable, like SSD streaming, which makes it possible to load MOE weights for a model larger than your VRAM, I don't see that landing in llama cpp anytime soon. | ||