Remix clone Hacker News

new | show | ask | jobs Github

	▲	kalkin 6 hours ago
		Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332