Remix.run Logo
nl 4 days ago

As ChatGPT said to you:

> A secret benchmark is: Useful for internal model selection

That's what I'm doing.

grog454 4 days ago | parent [-]

My question was "What's the value of a secret benchmark to anyone but the secret holder?"

The root of this whole discussion was a post about how Gemini 3 outperformed other models on some presumably informal question benchmark (a"vibe test"?). When asked for the benchmark, the response from the op and and someone else was that secrecy was needed to protect the benchmark from contamination. I'm skeptical of the need in the op's cases and I'm skeptical of the effectiveness of the secrecy in general. In a case where secrecy has actual value, why even discuss the benchmark publicly at all?