| ▲ | grog454 4 days ago | ||||||||||||||||
I guess there's two things I'm still stuck on: 1. What is the purpose of the benchmark? 2. What is the purpose of publicly discussing a benchmark's results but keeping the methodology secret? To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game. | |||||||||||||||||
| ▲ | nl 4 days ago | parent [-] | ||||||||||||||||
1. The purpose of the benchmark is to choose what models I use for my own system(s). This is extremely common practice in AI - I think every company I've worked with doing LLM work in the last 2 years has done this in some form. 2. I discussed that up-thread, but https://github.com/microsoft/private-benchmarking and https://arxiv.org/abs/2403.00393 discuss some further motivation for this if you are interested. > To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game. This is an odd way of looking at it. There is no "winning" at benchmarks, it's simply that it is a better and more repeatable evaluation than the old "vibe test" that people did in 2024. | |||||||||||||||||
| |||||||||||||||||