| ▲ | Keats 5 hours ago | |||||||
Is there some indication on how the different bit quantization affect performance? IE I have a 5090 + 96GB so I want to get the best possible model but I don't care about getting 2% better perf if I only get 5 tok/s. | ||||||||
| ▲ | mirekrusin 3 hours ago | parent [-] | |||||||
It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu. I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc. | ||||||||
| ||||||||