▲ | a2128 2 days ago | |||||||
I tried the vote and both results always suck, there's no option to say neither are winners. Also it seems from the network tab you're sending 4 (or 5?) requests but only displaying the first two that respond, which biases it to the small models that respond more quickly which usually results in showing two bad results | ||||||||
▲ | grace77 2 days ago | parent | next [-] | |||||||
Yes — great point. We originally waited for all model responses and randomized the vote order, but that made it a very bad user experience -- some models, especially open-source ones, took over 4 minutes to respond, leading to a high voter drop-off rate. To preserve the voter experience without introducing bias, our current approach waits for the slowest model within each binary comparison — so even if one model is faster, we don’t display until both are ready. You're right that this does introduce some bias for the two smallest models, and we'd love to hear suggestions for how to make this better! As for the 5th request: we actually kick off one reserve model alongside the four randomly selected for the tournament. This backup isn’t shown unless one of the four fails — it’s not the fastest or lowest-latency model, just a randomly selected fallback to keep the system robust without skewing results. | ||||||||
▲ | ethan_smith 2 days ago | parent | prev [-] | |||||||
Adding a "neither is good" option would improve data quality by preventing forced choices between two poor designs. | ||||||||
|