how well does it do on frontier models like Opus 4.6?
I have only done functionality testing, no benchmark testing on Opus (decided to pay my rent instead)