| ▲ | singpolyma3 2 hours ago | |||||||
False vs misleading doesn't seem like a disagreement? | ||||||||
| ▲ | wongarsu 2 hours ago | parent | next [-] | |||||||
According to the benchmark it is. "Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model's verdict is label-inconsistent under this 4-bucket rubric (True / Mostly True / Misleading / False)" | ||||||||
| ||||||||
| ▲ | kostaj 2 hours ago | parent | prev [-] | |||||||
Yes, they are much closer verdicts. True and Mostly True are also close. Used Krippendorff's α (ordinal) to not penalize much closer disagreements. 21% of the claims have models that are on the polar opposite sides - at least one True, and at least one False. | ||||||||
| ||||||||