Remix.run Logo
kostaj 2 hours ago

Two of the models used have retrieval capabilities and have access to newer information through search. The other three are parametric.

simonw 2 hours ago | parent | next [-]

Comparing models with search tools to models without - when there's no option for "I am unable to answer this question without access to search" - doesn't make sense to me.

kostaj an hour ago | parent [-]

Agree about comparing models with and without search capabilities. Even the two models with search capabilities (Sonar Pro and Gemini) agree only on 58% of the claims.

furyofantares 2 hours ago | parent | prev | next [-]

Yes, so in that case you set them up to disagree and then measured disagreement.

throw310822 2 hours ago | parent | prev [-]

The title mention "fact-checks", but "fact checking" is a process in which facts are checked against sources, not one where you are given a random fact and have to tell if it's true or false from your own memory. That's what is normally called a quiz game. So a more honest title for this research would be "Models answer differently to quiz questions".