Remix.run Logo
bflesch 5 days ago

Unfortunately the bank account trajectories are not public, because unscupulous corporations such FAANG who let thousands of engineers wade through my chat messages on their platforms might not shy away from bribing academics to improve benchmarks of their billion-dollar AI initiatives.

It's also a bribe if my sibling gets a job with $500k annual salary. Tech is not immune to it.

Zacharias030 5 days ago | parent [-]

You realize that this problem in SWE-Bench was discovered and publicized by people within those FAANG corporations?

TheDong 5 days ago | parent [-]

I'm sure some of the people working at Theranos thought there legitimately was a revolutionary blood-test machine.

The presence of a person who wants SWE-bench to have honest results and takes it seriously does not mean the results are free of perverse incentives, nor that everyone is behaving just as honestly.

Zacharias030 4 days ago | parent [-]

When Swe-Bench was new in 2023, it was — with all due respect — a bit of a niche benchmark in LLM research. LLMs were so incredibly useless at solving these tasks that I think you could find a bit more empathy for the original academic authors. I don’t think the Theranos example applies. Even the flawed benchmark was good enough to get us from ~GPT4 to Claude 4‘s coding ability.