| ▲ | wirybeige 2 hours ago | |
DS4 Pro/Flash were post trained with QAT, so they are already quantized to FP4 for the most part. That's why when downloading the weights, they are much smaller than what their weights at fp8 or fp16 would be. For example, Flash is a 284B model, but its GB size is only ~160GB. OFC maybe DeeppInfra went even further, but there is no proof of that. | ||