| ▲ | XCSme 2 hours ago | |
Also, what about the major flaw/bias linked for Gemini 3.5 flash? That has major real-life consequences if the model ends up being used for any automated scoring systems. I found it while trying to use 3.5 Flash for scoring the reasoning of some models, and it gets it wrong because of the centering bias, whereas 3 Flash gets scoring right. | ||