| ▲ | kingstnap 9 hours ago | |
According to many benchmarks this model is straight up frontier level and Zai seriously cooked. Some of these numbers are incredible. Excited to see if this turns out to be a Open Weight Opus 4.5 or better. | ||
| ▲ | andai 8 hours ago | parent | next [-] | |
The only benchmarks that matters is your actual task. I've had models that benched poorly but performed great. And I constantly see models at near the top of AA, which are terrible. There doesn't necessarily seem to be a lot of overlap between benchmarks and real world usage. (Let alone common sense!) As far as they go, though, these harder benchmarks match my experience more closely: and https://cognition.ai/blog/frontier-code Where we see "top" models drop way down in score when given longer tasks. That being said, I've had a reasonably pleasant time with GLM-5.2 so far. (And have had an OK time with DeepSeek as well.) By the time I'm done testing all the Chinese models, they'll be obsolete :) | ||
| ▲ | adastra22 4 hours ago | parent | prev [-] | |
According to reports in this thread it is somewhere between Opus 4.7 and 4.8. This is effectively frontier. | ||