▲ | irthomasthomas 7 days ago | ||||||||||||||||
Oh, I didn't know that. Weird! | |||||||||||||||||
▲ | reissbaker 7 days ago | parent [-] | ||||||||||||||||
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4). | |||||||||||||||||
|