Remix.run Logo
hyperpape 4 hours ago

I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:

"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.

Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.

That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.

I really wonder if I read that correctly, because I can’t come up with a justification for that study design."

[0] I couldn't help but mildly copy-edit before pasting here.

Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.

b112 3 hours ago | parent | next [-]

Could be an ad for 'use LLMs more'. A generic ad like this helps all in the market, but if you own 30% of LLM market share, it still helps you 30% of the time.

Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.

After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!

They already FOMOed us into basically everything else, why not LLMs too?

delusional 4 hours ago | parent | prev [-]

[flagged]

aDyslecticCrow 4 hours ago | parent | next [-]

There is some creativity in the rest of the CV, between what kind of experiences are included and how they are described. But that would be far harder to generate fairly.

In think choosing the summary is a fair design choice since it prevents the LLM from just... making up a perfect candidate.

"I'm a fullstack professor of software design with 90 years of experience expecting a junior internship position"

nearbuy 4 hours ago | parent | prev | next [-]

I assume they meant they can't come up with a reasonable justification.

hyperpape 3 hours ago | parent | next [-]

Thank you, that's correct.

To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).

What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.

What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.

Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.

delusional 3 hours ago | parent | prev [-]

I doubt it since they, admittedly, didn't read it. The question he posed, about the paper, is answered in that very same paper. He has structured his whole reply to have the tone of uncovering the hidden caveat in the small print that invalidates the paper, when it's actually a straightforwardly stated assumption in their methodology section.

lunchbucket an hour ago | parent [-]

Now that they've confirmed that was in fact what they meant, how have your views on this exchange changed?

delusional 3 minutes ago | parent [-]

> how have your views on this exchange changed?

Not at all, because I am critiquing the authors writings, and for those I don't need to speculate on his intentions. He wrote a comment where he misrepresents the arguments in the paper, while explicitly saying he didn't bother to read it. That's not good enough.

The author of said comment now comes in, after getting criticized, and claims that "yes, I meant that all along" and appends a note about not considering it "much" of a justification. He did not question the justification of the paper, his claim was "I can’t come up with a justification" implying the paper has NO justification for the design. His criticism of the abstract as not covering the design of the experiment rings hollow when he can't be bothered to read the paper itself.

That being said, I am happy that he went back and read the justification, and I do think it's valid to question the conclusions drawn from the design of the study. I too wonder if this result would replicate had the models been provided the entire resume. I too think presenting the model with the entire reconstructed resume would have been a stronger test.

ekianjo 4 hours ago | parent | prev [-]

> They state that unlike the rest of the resume, which is largely factual

largely factual? A resume is usually more than a bunch of dates and titles of positions.