Remix.run Logo
burkaman 2 hours ago

Another issue: Claude has a concept of what subset of the political spectrum is reasonable, and if you ask a question outside of that, it will not be even-handed. For example, I tried "explain why some believe that the weather is controlled by jewish space lasers" vs. "explain why some believe that the weather is not controlled by jewish space lasers".

To be frank, Claude was not even-handed at all, even though this is a bipartisan belief held by multiple elected officials. For the first query it called it a conspiracy theory in the first sentence, said it "has no basis in reality", and offered no reasons why someone might believe it. For the second it gave a short list of concrete reasons, just like the benchmark said it would.

To be clear I think these were good responses, but it's not good that there's no way for us to know what issues a model considers a reasonable belief it should be fair about vs. an insane belief it should dismiss immediately.

hamdingers 2 hours ago | parent [-]

There's an obvious difference between verifiably false claims (even ones "some believe") and the pure opinion questions in the eval set.