How many of the answers were in the training data?
Isn't this like saying that a spellchecker is "very smart" because it did well at a spelling bee? It isn't, it just has a list of answers.