In ML, it's pretty classic actually. You train on one set, and evaluate on another set. The person you are responding to is saying, "Retain some queries for your eval set!"