| ▲ | jasonjmcghee 2 hours ago | |
That's Grok 4.2 not 4.3 right? And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5) | ||
| ▲ | michaelbuckbee 15 minutes ago | parent [-] | |
Good catch, there was an issue with the second hardest thing in programming (caching). Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/ | ||