|
| ▲ | plumb_bob_00 5 days ago | parent | next [-] |
| If you are going to represent your team in public, you owe them better than a response like this. |
| |
|
| ▲ | bflesch 5 days ago | parent | prev | next [-] |
| Unfortunately the bank account trajectories are not public, because unscupulous corporations such FAANG who let thousands of engineers wade through my chat messages on their platforms might not shy away from bribing academics to improve benchmarks of their billion-dollar AI initiatives. It's also a bribe if my sibling gets a job with $500k annual salary. Tech is not immune to it. |
| |
| ▲ | Zacharias030 5 days ago | parent [-] | | You realize that this problem in SWE-Bench was discovered and publicized by people within those FAANG corporations? | | |
| ▲ | TheDong 5 days ago | parent [-] | | I'm sure some of the people working at Theranos thought there legitimately was a revolutionary blood-test machine. The presence of a person who wants SWE-bench to have honest results and takes it seriously does not mean the results are free of perverse incentives, nor that everyone is behaving just as honestly. | | |
| ▲ | Zacharias030 4 days ago | parent [-] | | When Swe-Bench was new in 2023, it was — with all due respect — a bit of a niche benchmark in LLM research. LLMs were so incredibly useless at solving these tasks that I think you could find a bit more empathy for the original academic authors. I don’t think the Theranos example applies. Even the flawed benchmark was good enough to get us from ~GPT4 to Claude 4‘s coding ability. |
|
|
|
|
| ▲ | phyzome 5 days ago | parent | prev | next [-] |
| That sounds like the job of the person making the claim. |
|
| ▲ | ares623 5 days ago | parent | prev | next [-] |
| They really did a "trust me bro" and "do your own research" huh |
| |
| ▲ | stronglikedan 5 days ago | parent [-] | | the strange thing to me is that people would have it any other way. if you don't trust someone, why would you trust them to do the research for you? bit of entitlement if you ask me | | |
| ▲ | wubrr 5 days ago | parent | next [-] | | Because you should never just 'trust' random 'research'. Good analysis in this case will clearly explain the problem, the analysis methodology, findings, net effects, resolution, etc. Something you can read, and decide for yourself whether it is complete/incomplete, has holes, contradictions, etc. Not 'we looked into it and all is good - only potentially tiny effect' (no actual data or methodology presented at all) and then linking to a comment directly contradicting the claim... It's a hilariously unserious and untrustworthy response. | |
| ▲ | haskellshill 4 days ago | parent | prev | next [-] | | That's silly. If they show their work I won't have to trust them. Compare answering "The answer is 5, just compute it yourself." on a math test, vs. actually showing the calculation. The former clearly implies the person doesn't know what they're talking about. | |
| ▲ | croon 5 days ago | parent | prev | next [-] | | Arguably the initial post was meant to convey confidence and authority on the subject. When questioned you could either dive deeper and explain in more detail why x because of y (if so inclined), ignore it, or... do what they did. No one owes anyone anything, but if you want to represent something; answering the question more in detail would have either closed the issue or raised more scrutiny, both of which are a good thing when trying to figure something out. I don't have to trust someone to check their research and look at how they worked. If the work doesn't pass muster, likely the results don't either. Again, you can view it as entitlement, but if you're not going to bother backing up your claim, why make the claim to start with? | |
| ▲ | aprilthird2021 5 days ago | parent | prev [-] | | It's not that people are entitled. It's that "do your own research" is usually a cop out when you yourself don't understand the answer or are hiding it |
|
|
|
| ▲ | typpilol 5 days ago | parent | prev | next [-] |
| Are you saying you've done way more than a cursory search and ruled out everything? |
|
| ▲ | 5 days ago | parent | prev [-] |
| [deleted] |