| ▲ | theamk 12 hours ago | |
So they use LLM to evaluate LLMs: with LLM writing the questions, another LLM writing the country-specific answers, and yet another LLM getting the country from an answer. The only manual steps seem to be "manually reviewed [questions] to remove repetitions or accidental location references." This seems like a pretty lazy methodology, as if there are LLM-specific country biases, they could be introduced at any stage of the process. | ||