Remix.run Logo
meander_water 15 hours ago

I'm afraid that ship has already sailed. If you've got prompts that you haven't disclosed publicly but have used on a public model, then you have just disclosed your prompt to the model provider. They're free to use that prompt in evals as they see fit.

Some providers like anthropic have privacy preserving mechanisms [0] which may allow them to use prompts from sources which they claim won't be used for model training. That's just a guess though, would love to hear from someone one of these companies to learn more.

[0] https://www.anthropic.com/research/clio

sillyfluke 14 hours ago | parent | next [-]

Unless I'm missing something glaringly obvious, someone voluntarily labeling a certain prompt to be one of their key benchmark prompts should be way more commercially valuable than a model provider trying ascertain that fact from all the prompts you enter into it.

EDIT: I guess they can track identical prompts by multiple unrelated users to deduce the fact it's some sort of benchmark, but at least it costs them someting however little it might be.

Xmd5a 5 hours ago | parent | next [-]

I wrote an anagrammatic poem that poses an enigma, asking the reader: "who am I?" The text progressively reveals its own principle as the poem reaches its conclusion: each verse is an anagrammatic recombination of the recipient's name, and it enunciates this principle more and more literally. The last 4 lines translate to: "If no word vice slams your name here, it's via it, vanquished as such, omitted." All 4 lines are anagrams of the same person's name.

LLMs haven't figured this out yet (although they're getting closer). They also fail to recognize that this is a cryptographic scheme respecting Kerckhoffs's Principle. The poem itself explains how to decode it: You can determine that the recipient's name is the decryption key because the encrypted form of the message (the poem) reveals its own decoding method. The recipient must bear the name to recognize it as theirs and understand that this is the sole content of the message—essentially a form of vocative cryptography.

LLMs also don't take the extra step of conceptualizing this as a covert communication method—broadcasting a secret message without prior coordination. And they miss what this implies for alignment if superintelligent AIs were to pursue this approach. Manipulating trust by embedding self-referential instructions, like this poem, that only certain recipients can "hear."

infoseek12 3 hours ago | parent [-]

That’s a complex encoding. I wonder if current models could decode it even given your explanation.

11 hours ago | parent | prev [-]
[deleted]
Tokumei-no-hito 4 hours ago | parent | prev | next [-]

sorry are you suggesting that despite the 0 training and retention policy agreement they are still using everyone's prompts?

blagie 7 hours ago | parent | prev [-]

It's a little bit more complex than that.

My personal benchmark is to ask about myself. I was in a situation a little bit analogous to Musk v. Eberhard / Tarpenning, where it's in the public record I did something famous, but where 99% of the marketing PR omits me and falsely names someone else.

I ask the analogue to "Who founded Tesla." Then I can screen:

* Musk. [Fail]

* Eberhard / Tarpenning. [Success]

A lot of what I'm looking for next is the ability to verify information. The training set contains a lot of disinformation. The LLM, in this case, could easily tell truth from fiction from e.g. a git record. It could then notice the conspicuous absence of my name from any official literature, and figure out there was a fraud.

False information in the training set is a broad problem. It covers politics, academic publishing, and many other domains.

Right now, LLMs are a popularity contest; they (approximately) contain the opinion most common in the training set. Better ones might look for credible sources (e.g. a peer-reviewed paper). This is helpful.

However, a breakpoint for me is when the LLM can verify things in its training set. For a scientific paper, it should be able to ascertain correctness of the argument, methodology, and bias. For a newspaper article, it should be able to go back to primary sources like photographs and legal filings. Etc.

We're nowhere close to an LLM being able to do that. However, LLMs can do things today which they were nowhere close to doing a year ago.

I use myself as a litmus test not because I'm egocentric or narcissistic, but because using something personal means that it's highly unlikely to ever be gamed. That's what I also recommend: pick something personal enough to you that it can't be gamed. It might be a friend, a fact in a domain, or a company you've worked at.

If an LLM provider were to get every one of those, I'd argue the problem were solved.

ckandes1 7 hours ago | parent [-]

there's plenty of public information about Eberhard / Tarpenning involvement in founding Tesla. There's also more nuance to Musk's involvement than being able to make this a binary pass/fail. Your test is only testing for bias for or against Musk. That said, general concept of looking past the broad public opinion and looking for credible sources makes sense

kotojo 6 hours ago | parent [-]

They said they ask a question analogous to asking about founding Tesla, not that actual question. They are just using that as an example to not state the actual question they ask.

Xmd5a 4 hours ago | parent [-]

Indeed but the idea that this is a "cope" is interesting nonetheless.

>Your test is only testing for bias for or against [I'm adapting here] you.

I think this raises the question of what reasoning beyond Doxa entails. Can you make up for one's injustice without putting alignment into the frying pan? "It depends" is the right answer. However, what is the shape of the boundary between the two ?