▲ | penteract 3 days ago | |
From the report: > To calculate the energy consumption for the median Gemini Apps text prompt on a given day, we first determine the average energy/prompt for each model, and then rank these models by their energy/prompt values. We then construct a cumulative distribution of text prompts along this energy-ranked list to identify the model that serves the 50-th percentile prompt. They are measuring more than one model. I assume this statement describes how they chose which model to report the LM arena score for, and it's a ridiculous way to do so - the LM arena score calculated this way could change dramatically day-to-day. |