M2 was one of the most benchmaxxed models we've seen. Huge gap between SWE-B results and tasks it hasn't been trained on. We'll put 2.5 on the list. https://brokk.ai/power-ranking