Remix clone Hacker News

new | show | ask | jobs Github

	▲	AIhumanbench 8 hours ago
		aihumanbench.com
	▲	rad-b 8 hours ago \| parent [-]
		Seems interesting but testing myself only yields my results? How would I compare the result to a frontier model, that part seems to be missing? Also, the tests seem to be heavily skewed in favor of what LLMs are good at.