|
| ▲ | qnleigh 3 hours ago | parent | next [-] |
| As far as I understand, this is exactly how ELO scores work. If a more capable show up and starts beating all the other models, it literally takes ELO points from everyone else. https://en.wikipedia.org/wiki/Elo_rating_system |
| |
| ▲ | TekMol an hour ago | parent | next [-] | | If a more capable show up and starts
beating all the other models
There is an instance of this in the chart. In 2025-06-24 when Gemini-2.5-pro shows up. As you can see, the ELO of the others do not drop. | |
| ▲ | harperlee 3 hours ago | parent | prev [-] | | Depends on the test design; is an agent competing against other agent in a given match, or against a test? Plus! Does the test's ELO fluctuate? |
|
|
| ▲ | tasuki 2 hours ago | parent | prev | next [-] |
| Yes, that is in fact how Elo can work[0]. There are quite many ways Elo systems can work. [0]: https://en.wikipedia.org/wiki/Elo_rating_system |
|
| ▲ | whiplash451 3 hours ago | parent | prev [-] |
| It depends what you use as an anchor. If the anchor is a fixed model, you’re right. If the anchor is updated to a better model over time, then the elo of historical models degrades, right? |