▲ | numeri 5 days ago | |
This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated. |