Looking at the other comments, you'll see this is in fact the database of questions & answers we used as our source material for the benchmarks. You'll also find the explanation of what I meant by this particular sentence and a preview of how we tested for it.

▲

andrepd 3 days ago | parent [-]

Your statement was

> We were unable to find evidence that the Only Connect games are *in the training materials*.

which is obviously completely false. You acknowledge as much in another comment when you say

> To clarify what I meant by this: Despite looking, we haven't seen any evidence of any of the models consistently responding based on pre-trained knowledge (outside of easier-to-guess trivia-type questions).

which has nothing to do with what you said x)) Basically: "to clarify, when I said X, I actually meant something else entirely".

But fine, at least now it's not bullshit, it's just vague enough that it wouldn't pass in a 9th grade science project where I went to school.

Just my 2 cents.

-----

If you'd like to explain more how you supposedly concluded that it wasn't returning data in its training set, I'm all ears.

	▲	scrollaway 3 days ago \| parent [-]
		Sorry; I dropped out of school, so I wouldn't know about 9th grade science projects. Would you like to phrase your constructive feedback as an attack instead? (/shrug) Edit after your update: As mentioned in the other comment, the tests were mostly ad-hoc. It's nearly impossible to prove whether something is absent from the training data, but it's possible to put the LLM in a bunch of situations which would be conducive to completing with pre-existing knowledge.