Mixed results indeed. While it leads the benchmark in two question types, it falls short in others which results in the overall slight regression.