▲ | namaria 12 hours ago | |
A benchmark that can be gamed cannot be prevented from being gamed by 'security through obscurity'. Besides this whole line of reasoning is preempted by the mathematical limits to computation and transformers anyway. There's plenty published about that. Sharing questions that make LLM behave funny is (just) a game without end, there's no need to or point in "hoarding questions". |