> Lots of variance in the score can come from random stuff like even Anthropic's servers being overloaded.
Are you suggesting result accuracy varies with server load?