Remix clone Hacker News

new | show | ask | jobs Github

	▲	BoorishBears 3 days ago
		You're seeming missing the release announcement does have a very ridiculous graph that their comment is right to call out: - For refusals they broke out each model's percentage. - For "% of Questions Correct by Category" they literally grouped an unnamed set of models, averaged out their scores, and combined them as "Other"... That's hilariously sketchy. It's also strange that the graph for "Questions Correct" includes creativity and writing. Those don't have correct answers, only win rates, and wouldn't really fit into the same graph.