Remix clone Hacker News

new | show | ask | jobs Github

	▲	zihotki 6 hours ago
		I wonder if this benchmark brings any value. Models are already quite capable and reach high scores in it.
	▲	khurdula 5 hours ago \| parent [-]
		Check out the "The JSON-pass vs Value-Accuracy gap" section in the blog. That was an eye opener. While most models were great at producing JSON schema, they were pretty bad at producing accurate values. In the graph you'll is almost a 20%-30% drop between the JSON schema pass vs the value accuracy.