| ▲ | edg5000 2 hours ago | |
Wow, looks like you've found a massive flaw indeed. I was skeptical about the results because in my experience both recent GPT and Opus modules are strong. Everything else is B or C tier. This is just artisanal vibe testing though. It's very hard to eval them properly. | ||