GLM 5.2 and Qwen 3.5 match the benchmark scores this article says their models get, so that is probably up to date.