Remix.run Logo
mirekrusin 3 hours ago

It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.

I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.

Keats 3 hours ago | parent [-]

Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example.