Remix.run Logo
ProgramBench: Can Language Models Rebuild Programs from Scratch?(github.com)
3 points by fittingopposite 6 hours ago | 1 comments
Kuinox 5 hours ago | parent [-]

I didn't managed to find the tests. How can we know that the tests are actually reasonable in this case ?