▲ | sp1982 5 days ago | |
This makes sense. I recently did an experiment to test GPT5 on hallucinations on cricket data where there is a lot of statistical pressure. It is far better to say idk than a wrong answer. Most current benchmarks don’t test for that. https://kaamvaam.com/machine-learning-ai/llm-eval-hallucinat... |