| ▲ | threepts an hour ago | |||||||
That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of tokens outperform 10k dollars of labor in meta design. | ||||||||
| ▲ | SpicyLemonZest an hour ago | parent [-] | |||||||
That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix. | ||||||||
| ||||||||