Remix clone Hacker News

Unless I'm missing something glaringly obvious, someone voluntarily labeling a certain prompt to be one of their key benchmark prompts should be way more commercially valuable than a model provider trying ascertain that fact from all the prompts you enter into it.

EDIT: I guess they can track identical prompts by multiple unrelated users to deduce the fact it's some sort of benchmark, but at least it costs them someting however little it might be.

▲

Xmd5a 6 hours ago | parent | next [-]

I wrote an anagrammatic poem that poses an enigma, asking the reader: "who am I?" The text progressively reveals its own principle as the poem reaches its conclusion: each verse is an anagrammatic recombination of the recipient's name, and it enunciates this principle more and more literally. The last 4 lines translate to: "If no word vice slams your name here, it's via it, vanquished as such, omitted." All 4 lines are anagrams of the same person's name.

LLMs haven't figured this out yet (although they're getting closer). They also fail to recognize that this is a cryptographic scheme respecting Kerckhoffs's Principle. The poem itself explains how to decode it: You can determine that the recipient's name is the decryption key because the encrypted form of the message (the poem) reveals its own decoding method. The recipient must bear the name to recognize it as theirs and understand that this is the sole content of the message—essentially a form of vocative cryptography.

LLMs also don't take the extra step of conceptualizing this as a covert communication method—broadcasting a secret message without prior coordination. And they miss what this implies for alignment if superintelligent AIs were to pursue this approach. Manipulating trust by embedding self-referential instructions, like this poem, that only certain recipients can "hear."

	▲	infoseek12 4 hours ago \| parent [-]
		That’s a complex encoding. I wonder if current models could decode it even given your explanation.

▲

12 hours ago | parent | prev [-]

[deleted]