| ▲ | akoboldfrying 4 days ago | |||||||
I actually think "concealing the question" is not only a good idea, but a rather general and powerful idea that should be much more widely deployed (but often won't be, for what I consider "emotional reasons"). Example: You are probably already aware that almost any metric that you try to use to measure code quality can be easily gamed. One possible strategy is to choose a weighted mixture of metrics and conceal the weights. The weights can even change over time. Is it perfect? No. But it's at least correlated with code quality -- and it's not trivially gameable, which puts it above most individual public metrics. | ||||||||
| ▲ | grog454 4 days ago | parent [-] | |||||||
It's hard to have any certainty around concealment unless you are only testing local LLMs. As a matter of principle I assume the input and output of any query I run in a remote LLM is permanently public information (same with search queries). Will someone (or some system) see my query and think "we ought to improve this"? I have no idea since I don't work on these systems. In some instances involving random sampling... probably yes! This is the second reason I find the idea of publicly discussing secret benchmarks silly. | ||||||||
| ||||||||