Remix.run Logo
apercu 4 hours ago

Has anyone else used LLMs to fact check other LLMS?

I hate to say it, but Gemini lies less frequently than paid models from OPenAI and Anthropic (Open AI is worst in my use cases).

My guess is that Google has better training data (and uses less synthetic data which might be creating training feedback loops in other models), has more of a "be calibrated" model than a "be helpful" model, but it could just be that they leverage more RAG than leveraging weights more.

But, I really shouldn't speculate the "why" as I'm out of my domain. Just curious if others use all the models they can and compare outputs as much as I do.