Remix.run Logo
edg5000 2 hours ago

Wow, looks like you've found a massive flaw indeed.

I was skeptical about the results because in my experience both recent GPT and Opus modules are strong. Everything else is B or C tier. This is just artisanal vibe testing though. It's very hard to eval them properly.