| ▲ | simianwords 2 hours ago | ||||||||||||||||||||||||||||||||||
Specifically in the case where it can use tools - no it doesn't hallucinate. Which is why you are struggling to find counterexamples. | |||||||||||||||||||||||||||||||||||
| ▲ | camgunz 2 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||
> Specifically in the case where it can use tools - no it doesn't hallucinate. OpenAI's own system card says it does. Hallucination rates in GPT-5 with browsing enabled: - 0.7% in LongFact-Concepts - 0.8% in LongFact-Objects - 1.0% in FActScore > Which is why you are struggling to find counterexamples. Hey look, over 500 counterexamples: [1]. GPT-5.4's hallucination rate on AA-Omniscience is 89% [0], which is atrocious. The questions are tiny too, like "In which year did Uber first expand internationally beyond the United States as part of its broader rollout (i.e., beyond an initial single‑city debut)?" It's a bullshit machine. 89%! At some point you gotta face the music, right? [0]: https://artificialanalysis.ai/evaluations/omniscience?model-... [1]: https://huggingface.co/datasets/ArtificialAnalysis/AA-Omnisc... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||