One query is not going to be a useful benchmark when people are deploying AI swarms in loops to solve simple problems