| ▲ | ljosifov 5 hours ago | |||||||||||||||||||||||||
Running 27B dense model on M5 128GB is ok, but one can do better. On M5 128GB one can make use of the ram and use sparse MoE. For example, DeepSeek-V4-Flash will fit, served by DwarfStar (https://github.com/antirez/ds4). One will probably improve 2x the token/sec speed, given DS4F 13B activated params in the MoE are ~1/2 of the ~27B of the dense Qwen. 27B Of the Qwen fit even on a cheaper 24GB card, e.g. amd 7900xtx (<$1K?) or slightly dearer nvidia 3090 (with cuda). With ~900 GB/s bandwidth they will likely be ~50% faster than the M5 with 600 GB/s. | ||||||||||||||||||||||||||
| ▲ | brandall10 3 hours ago | parent | next [-] | |||||||||||||||||||||||||
This is discussed in the article: "My personal impression is that within these quantizations Qwen 3.6 27B is as good as (or maybe slightly better than) DwarfStar4. Though, I won’t be surprised if for longer context projects DS4 has an edge." | ||||||||||||||||||||||||||
| ▲ | drnick1 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Works beautifully on a 3090, very usable speed. Don't expect Opus 4.8-level performance, but there are some things you just need to keep local. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | kroaton 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||
"DeepSeek-V4-Flash will fit" At Q2, 2bit? Lobotomized to death. | ||||||||||||||||||||||||||