What models are you testing? A 120b model with hybrid attention should fit within 80gb of VRAM fine at a 4-bit quant. Also, 4-bit quants that are done well are generally fine. They certainly don’t make the model unusable.