| ▲ | danielhanchen 7 hours ago |
| UD stands for "Unsloth-Dynamic" which upcasts important layers to higher bits. Non UD is just standard llama.cpp quants. Both still use our calibration dataset. |
|
| ▲ | CamperBob2 6 hours ago | parent | next [-] |
| Please consider authoring a single, straightforward introductory-level page somewhere that explains what all the filename components mean, and who should use which variants. The green/yellow/red indicators for different levels of hardware support are really helpful, but far from enough IMO. |
| |
| ▲ | danielhanchen 6 hours ago | parent | next [-] | | Oh good idea! In general UD-Q4_K_XL (Unsloth Dynamic 4bits Extra Large) is what I generally recommend for most hardware - MXFP4_MOE is also ok | | |
| ▲ | Keats 4 hours ago | parent [-] | | Is there some indication on how the different bit quantization affect performance? IE I have a 5090 + 96GB so I want to get the best possible model but I don't care about getting 2% better perf if I only get 5 tok/s. | | |
| ▲ | mirekrusin 3 hours ago | parent [-] | | It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu. I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc. | | |
| ▲ | Keats 3 hours ago | parent [-] | | Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example. |
|
|
| |
| ▲ | segmondy 6 hours ago | parent | prev [-] | | The green/yellow/red indicators are based on what you set for your hardware on huggingface. |
|
|
| ▲ | ranger_danger an hour ago | parent | prev [-] |
| What is your definition of "important" in this context? |