There's an obvious difference between verifiably false claims (even ones "some believe") and the pure opinion questions in the eval set.