| ▲ | ishurand4 3 hours ago | |
The numbers they show don't matter. "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6.", but what did anthropic do? They just stopped showing the benchmark altogether and then just show the cherry top ones that got improved on. | ||