| ▲ | CamperBob2 2 hours ago | |||||||
You can run the NV4FP quant with 8x RTX6000 cards at 50-75 tps output, but not (practically speaking) the OEM FP8 version. You will learn more about PCIe than you ever wanted to know. The real gangstas are running 16x RTX6000s. Too rich for my blood, and the NV4FP quant doesn't seem to be that much worse. | ||||||||
| ▲ | Sanzig 29 minutes ago | parent [-] | |||||||
Anyone done any benchmarks on the NV4FP quant? Seriously considering pitching an 8 x RTX 6000 Pro box at work to run GLM-5.2 in an air gapped environment. | ||||||||
| ||||||||