Remix.run Logo
zone411 2 hours ago

They're improved compared to 4.5 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/).

Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.

Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.