Remix clone Hacker News

new | show | ask | jobs Github

	▲	wongarsu 7 hours ago
		It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable. That's the one benchmark that allows LLMs to answer "I don't know" and punishes them for trying to bullshit their way through the questions