Remix clone Hacker News

new | show | ask | jobs Github

	▲	zone411 20 hours ago
		I’ve tested this model on four of my benchmarks: https://github.com/lechmazur/buyout_game 10th out 36. https://github.com/lechmazur/pact/ 14th out 25. https://github.com/lechmazur/nyt-connections/ 60th out 81. https://github.com/lechmazur/debate 16th out of 29.
	▲	baxtr 3 hours ago \| parent \| next [-]
		Good stuff! Is there a reason you change the leaderboard graphs for the third and fourth one? Also: would be great to have an overview page with a summary over all test, like a total score or similar.
	▲	CamperBob2 4 hours ago \| parent \| prev [-]
		Would be interesting to see the 27B dense Qwen 3.6 model thrown into the mix.