Remix.run Logo
eli 3 hours ago

The other "cheating" examples are even worse. It's wild to me that people keep designing benchmarks where the answer is lying around on disk or in the git history. "Hardening" the benchmark with strongly worded prompt instructions is bizarre. There are so many agent sandbox solutions. Why not use one and give it only access to the code it should see?

And I'm not sure how they can rule out other solutions also benefiting from being in the training data, just not reproduced exactly. Seems like it should focus on only CVEs from the last 30 days or something.

bensyverson 2 hours ago | parent [-]

100%… the fact that they're just using prompting to discourage the agent from looking ahead in the Git history is wild.

numeri 2 hours ago | parent | next [-]

To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)

It's not a great sign for alignment.

bensyverson an hour ago | parent [-]

Agreed, alignment is just a separate issue that a vuln fixing benchmark doesn't need to be testing.

fragmede an hour ago | parent | prev [-]

Obviously they could just delete .git for their test if they wanted to. But consider telling the LLM not to use git commands the same as if you have keys in a .env file, and you tell the LLM not to read it, you might be concerned.