If we're being completely honest, a benchmark is like an honest exam: any set of questions can only be used once when it comes out. Otherwise you're only testing how well people can acquire and memorize exact questions.