Remix.run Logo
blagie 15 hours ago

It's a little bit more complex than that.

My personal benchmark is to ask about myself. I was in a situation a little bit analogous to Musk v. Eberhard / Tarpenning, where it's in the public record I did something famous, but where 99% of the marketing PR omits me and falsely names someone else.

I ask the analogue to "Who founded Tesla." Then I can screen:

* Musk. [Fail]

* Eberhard / Tarpenning. [Success]

A lot of what I'm looking for next is the ability to verify information. The training set contains a lot of disinformation. The LLM, in this case, could easily tell truth from fiction from e.g. a git record. It could then notice the conspicuous absence of my name from any official literature, and figure out there was a fraud.

False information in the training set is a broad problem. It covers politics, academic publishing, and many other domains.

Right now, LLMs are a popularity contest; they (approximately) contain the opinion most common in the training set. Better ones might look for credible sources (e.g. a peer-reviewed paper). This is helpful.

However, a breakpoint for me is when the LLM can verify things in its training set. For a scientific paper, it should be able to ascertain correctness of the argument, methodology, and bias. For a newspaper article, it should be able to go back to primary sources like photographs and legal filings. Etc.

We're nowhere close to an LLM being able to do that. However, LLMs can do things today which they were nowhere close to doing a year ago.

I use myself as a litmus test not because I'm egocentric or narcissistic, but because using something personal means that it's highly unlikely to ever be gamed. That's what I also recommend: pick something personal enough to you that it can't be gamed. It might be a friend, a fact in a domain, or a company you've worked at.

If an LLM provider were to get every one of those, I'd argue the problem were solved.

ckandes1 14 hours ago | parent [-]

there's plenty of public information about Eberhard / Tarpenning involvement in founding Tesla. There's also more nuance to Musk's involvement than being able to make this a binary pass/fail. Your test is only testing for bias for or against Musk. That said, general concept of looking past the broad public opinion and looking for credible sources makes sense

kotojo 14 hours ago | parent [-]

They said they ask a question analogous to asking about founding Tesla, not that actual question. They are just using that as an example to not state the actual question they ask.

Xmd5a 12 hours ago | parent [-]

Indeed but the idea that this is a "cope" is interesting nonetheless.

>Your test is only testing for bias for or against [I'm adapting here] you.

I think this raises the question of what reasoning beyond Doxa entails. Can you make up for one's injustice without putting alignment into the frying pan? "It depends" is the right answer. However, what is the shape of the boundary between the two ?