| ▲ | zone411 4 hours ago | |
Sets a new record on the Extended NYT Connections benchmark: 96.8 (https://github.com/lechmazur/nyt-connections/). Grok 4 is at 92.1, GPT-5 Pro at 83.9, Claude Opus 4.1 Thinking 16K at 58.8. Gemini 2.5 Pro scored 57.6, so this is a huge improvement. | ||