I assume they meant they can't come up with a reasonable justification.

hyperpape 4 hours ago | parent | next [-]

Thank you, that's correct.

To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).

What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.

What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.

Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.

	▲	delusional an hour ago \| parent [-]
		> Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings. What findings are being misrepresented? Their claims seem supported by their conclusions to me. You can question the generality of their claims based on the limitation of their methods, but that does not amount to "misreporting" the conclusion.

▲

delusional 5 hours ago | parent | prev [-]

I doubt it since they, admittedly, didn't read it. The question he posed, about the paper, is answered in that very same paper. He has structured his whole reply to have the tone of uncovering the hidden caveat in the small print that invalidates the paper, when it's actually a straightforwardly stated assumption in their methodology section.

▲

lunchbucket 3 hours ago | parent [-]

Now that they've confirmed that was in fact what they meant, how have your views on this exchange changed?

	▲	delusional 2 hours ago \| parent [-]
		> how have your views on this exchange changed? Not at all, because I am critiquing the authors writings, and for those I don't need to speculate on his intentions. He wrote a comment where he misrepresents the arguments in the paper, while explicitly saying he didn't bother to read it. That's not good enough. The author of said comment now comes in, after getting criticized, and claims that "yes, I meant that all along" and appends a note about not considering it "much" of a justification. He did not question the justification of the paper, his claim was "I can’t come up with a justification" implying the paper has NO justification for the design. His criticism of the abstract as not covering the design of the experiment rings hollow when he can't be bothered to read the paper itself. That being said, I am happy that he went back and read the justification, and I do think it's valid to question the conclusions drawn from the design of the study. I too wonder if this result would replicate had the models been provided the entire resume. I too think presenting the model with the entire reconstructed resume would have been a stronger test.