▲ | chaos_emergent 14 hours ago | |
Yes exactly, my theory is that the novelty of a new generation of LLMs’ performances tends to cause an inflation in peoples’ perceptions of the model, with a reversion to a better calibrated expectation over time. If the developer reported numerical evaluations that drifted over time, I’d be more convinced of model change. |