| ▲ | grog454 4 days ago | |
It's hard to have any certainty around concealment unless you are only testing local LLMs. As a matter of principle I assume the input and output of any query I run in a remote LLM is permanently public information (same with search queries). Will someone (or some system) see my query and think "we ought to improve this"? I have no idea since I don't work on these systems. In some instances involving random sampling... probably yes! This is the second reason I find the idea of publicly discussing secret benchmarks silly. | ||
| ▲ | grog454 4 days ago | parent [-] | |
I learned in another thread there is some work being done to avoid contamination of training data during evaluation of remote models using trusted execution environments (https://arxiv.org/pdf/2403.00393). It requires participation of the model owner. | ||