| ▲ | energy123 5 hours ago | |
Yet people don't use old models through the API much, because changes in benchmark space dont map linearly to changes in utility space. An improvement from 98% to 99%, which is 1pp, might be 2x as valuable for some application. Also benchmarks will asymptote no matter what, that's baked in. | ||