These OSS model makers need to stop benchmarking against old models. Showing how it performs against Opus 4.5, GLM-5 when we have Opus 4.6 and GLM-5.1 just tells me that it's not comparable to SOTA.