| ▲ | coder543 3 hours ago | |||||||||||||
I would not say a full year... not even close to a year: GLM-5 is very close to the frontier: https://artificialanalysis.ai/ Artificial Analysis isn't perfect, but it is an independent third party that actually runs the benchmarks themselves, and they use a wide range of benchmarks. It is a better automated litmus test than any other that I've been able to find in years of watching the development of LLMs. And the gap has been rapidly shrinking: https://www.youtube.com/watch?v=0NBILspM4c4&t=642s | ||||||||||||||
| ▲ | zozbot234 2 hours ago | parent [-] | |||||||||||||
Benchmarks are always fishy, you need to look at things that you'd use the model for in the real world. From that point of view, the SOTA for open models is quite behind. | ||||||||||||||
| ||||||||||||||