▲ | OtherShrezzing 4 days ago | |
For the _reviving 20 year old code_ type tasks, are the tested outcomes things we'd expect to be in the public domain? For example, in the way the 'SWEBenchVerified' tests are poisoned tests, because the LLMs are able to look up bug fixes in the project git repository. | ||
▲ | criemen 4 days ago | parent [-] | |
> because the LLMs are able to look up bug fixes in the project git repository That's not the (only) problem: Even if you take the internet away, we know/assume that all LLMs are heavily trained on public GitHub repositories. Therefore, they know/remember details of the code and organization in a way they can't for your private (or new, past knowledge cut-off date) code. |