| ▲ | gertlabs 4 hours ago | |
4.5/4.6 were roughly the same in our testing. Opus 4.7 is smarter, but it's difficult to use as a product for various personality issues. So far, Opus 4.8 seems to be going down that path (unusably slow, but this could be a launch day rollout problem). Full Opus 4.8 tests are in progress now. Data at https://gertlabs.com/rankings | ||
| ▲ | __s 3 hours ago | parent [-] | |
"personality issues" I was able to tell that Opus 4.7 would take instructions more literally, which I appreciated once I calibrated my phrasing to be more precise (often asking to investigate issues, pre-4.7 it'd start making code changes instead of just giving write up). But I can see contexts where handling vague prompts would've just been worse | ||