Remix clone Hacker News

new | show | ask | jobs Github

	▲	theamk 12 hours ago
		So they use LLM to evaluate LLMs: with LLM writing the questions, another LLM writing the country-specific answers, and yet another LLM getting the country from an answer. The only manual steps seem to be "manually reviewed [questions] to remove repetitions or accidental location references." This seems like a pretty lazy methodology, as if there are LLM-specific country biases, they could be introduced at any stage of the process.