| ▲ | torginus 4 hours ago | |
Why doesn't Qwen itself release the quantized model? My impression is that quantization is a highly nontrivial process that can degrade the model in non-obvious ways, thus its best handled by people who actually built the model, otherwise the results might be disappointing. Users of the quantized model might be even made to think that the model sucks because the quantized version does. | ||
| ▲ | bityard 3 hours ago | parent | next [-] | |
Model developers release open-weight models for all sorts of reasons, but the most common reason is to share their work with the greater AI research community. Sure, they might allow or even encourage personal and commercial use of the model, but they don't necessarily want to be responsible for end-user support. An imperfect analogy might be the Linux kernel. Linus publishes official releases as a tagged source tree but most people who use Linux run a kernel that has been tweaked, built, and packaged by someone else. That said, models often DO come from the factory in multiple quants. Here's the FP8 quant for Qwen3.6 for example: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8 Unsloth and other organizations produce a wider variety of quants than upstream to fit a wider variety of hardware, and so end users can make their own size/quality trade-offs as needed. | ||
| ▲ | halJordan 3 hours ago | parent | prev [-] | |
Quantization is an extraordinarily trivial process. Especially if you're doing it with llama.cpp (which unsloth obviously does). Qwen did release an fp8 version, which is a quantized version. | ||