You’re completely overrating these benchmarks and it’s landing you at a nonsense opinion. Just actually use the models and you will see that the gap is significant.