Benchmarks are always fishy, you need to look at things that you'd use the model for in the real world. From that point of view, the SOTA for open models is quite behind.