| ▲ | bnmik 11 hours ago | |
I am working on Desiderata (https://github.com/github-of-NMI/Desiderata). An LLM benchmark for open-weight models only, with secret questions. The questions are asked multiple times to calculate a consistency score. The results are available in JSON, containing the hash of the question with the number of correct and incorrect answers, the number of unique answers, and the number of times no answer is given. (Uses \boxed{}) | ||