| ▲ | qeternity 4 days ago | |
Quantization is not some magical dial you can just turn. In practice you basically have 3 choices: fp16, fp8 and fp4. Also thinking time means more tokens which costs more especially at the API level where you are paying per token and would be trivially observable. There is basically no evidence that either of these are occurring in the way you suggest (boosting up and down). | ||
| ▲ | Workaccount2 4 days ago | parent [-] | |
API users probably wouldn't be affected since they are paying in full. Most people complaining are free users, followed by $20/mo users. | ||