| ▲ | camgunz 2 hours ago | |||||||||||||||||||||||||
> Specifically in the case where it can use tools - no it doesn't hallucinate. OpenAI's own system card says it does. Hallucination rates in GPT-5 with browsing enabled: - 0.7% in LongFact-Concepts - 0.8% in LongFact-Objects - 1.0% in FActScore > Which is why you are struggling to find counterexamples. Hey look, over 500 counterexamples: [1]. GPT-5.4's hallucination rate on AA-Omniscience is 89% [0], which is atrocious. The questions are tiny too, like "In which year did Uber first expand internationally beyond the United States as part of its broader rollout (i.e., beyond an initial single‑city debut)?" It's a bullshit machine. 89%! At some point you gotta face the music, right? [0]: https://artificialanalysis.ai/evaluations/omniscience?model-... [1]: https://huggingface.co/datasets/ArtificialAnalysis/AA-Omnisc... | ||||||||||||||||||||||||||
| ▲ | simianwords 2 hours ago | parent [-] | |||||||||||||||||||||||||
You had to go all the way and find it in the benchmark results that specifically stress test this. You could not come up with a single one yourself. And you also linked an example where it was not allowed to use tools when I specifically said that it should be able to use tools. I'm not sure why are you present this as though it is a big gotcha. I think my main point pretty much stands. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||