Remix clone Hacker News

new | show | ask | jobs Github

	▲	tedsanders 3 hours ago
		In the text, we did share one hallucination benchmark: Claim-level errors fell by 33% and responses with an error fell by 18%, on a set of error-prone ChatGPT prompts we collected (though of course the rate will vary a lot across different types of prompts). Hallucinations are the #1 problem with language models and we are working hard to keep bringing the rate down. (I work at OpenAI.)