Remix.run Logo
kostaj 14 minutes ago

Good idea about publishing inter-model variance data! Will include in the next version. Even if we put aside the two middle buckets (Mostly True and Misleading), that are somewhat subject to interpretation and hedging: On 21% of the claims still at least two models provide polar-opposite verdicts (one model saying True, and another saying False)

vlovich123 7 minutes ago | parent [-]

Of those 21% how many are time-dependent questions that are past the model’s training and requires research to verify? Like the “did Ukraine attack Russian in the past week” question?