| ▲ | fcpk 2 days ago | |||||||
something I have been wondering about is doing regressive layer specific quantization based on large test sets. ie reduce very specifically layers that don't improve general quality. | ||||||||
| ▲ | buildbot 2 days ago | parent | next [-] | |||||||
This is a thing! For example, https://arxiv.org/abs/2511.06516 | ||||||||
| ||||||||
| ▲ | woadwarrior01 a day ago | parent | prev | next [-] | |||||||
This is a very well established idea. It's called dynamic quantization. Vary the quantization bit-width (or skip quantization altogether) on a layer by layer basis, using a calibration dataset. EvoPress is the first time that comes to my mind, when I think of dynamic quantization. | ||||||||
| ▲ | qskousen a day ago | parent | prev [-] | |||||||
I've experimented with this with diffusion models with a safetensors - gguf tool I wrote. even with relatively few sample images (~10k, still enough to keep my 3090 spinning for days straight) the benefits are quite noticeable - a smaller file with overall better results. | ||||||||