Remix clone Hacker News

new | show | ask | jobs Github

	▲	dom96 5 hours ago
		Why do none of the benchmarks test for hallucinations?
	▲	tedsanders 3 hours ago \| parent \| next [-]
		In the text, we did share one hallucination benchmark: Claim-level errors fell by 33% and responses with an error fell by 18%, on a set of error-prone ChatGPT prompts we collected (though of course the rate will vary a lot across different types of prompts). Hallucinations are the #1 problem with language models and we are working hard to keep bringing the rate down. (I work at OpenAI.)
	▲	netule 4 hours ago \| parent \| prev [-]
		[flagged]