He wasn't saying that both of the models suck, but that the heuristics for measuring model capability suck