So Grok 4 scores 130 but they put Grok midway in the pack at 110. Bias much?
There are two tests and by default it ranks by the score in the "offline test"