Vine is about the only benchmark I think is real.
We made objective systems turn out subjective answers… why the shit would anyone think objective tests would be able to grade them?