| ▲ | aoeusnth1 9 hours ago | |
SWE-bench pro is ~20% higher than the previous .1 generation which was released 2 months ago. For their SWE benchmark, the token consumption iso-performance is down 2x from the model they released 2 months ago. If this is a plateau I struggle to imagine what you consider fast progress. | ||