"Lots of variance in the score can come from random stuff like even Anthropic's servers being overloaded"
Aha, so the models do degrade under load.