| ▲ | selcuka 6 hours ago | ||||||||||||||||
> we don't switch to heavily quantized models That sounded like a press bulletin, so just to let you clarify yourself: Does that mean you may switch to lightly quantized models? | |||||||||||||||||
| ▲ | jychang 6 hours ago | parent [-] | ||||||||||||||||
There's almost 0% chance that OpenAI doesn't quantize the model right off the bat. I am willing to bet large amounts of money that OpenAI would never release a model served as fully BF16 in the year of our lord 2026. That would be insane operationally. They're almost certainly doing QAT to FP4 for FFN, and a similar or slightly larger quant for attention tensors. | |||||||||||||||||
| |||||||||||||||||