| ▲ | kalkin 6 hours ago | |
Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332 | ||