| ▲ | trollbridge 4 days ago | |
I evaluate how good models are now by how good they are at removing code. It’s fairly simple (assuming the test harness and agents.md are well written): do iterations of trying to remove code, ensure it passes, then have a human review it. Less code to review that way. | ||