| ▲ | jzig 8 hours ago | |
At what point along the 1M window does context become "long" enough that this degradation occurs? | ||
| ▲ | daemonologist 8 hours ago | parent [-] | |
The benchmark GP mentioned is measuring at 128k-256k context (there's another at 524k-1024k, where 4.6 scored 78.3% and 4.7 scored 32.2%). The longer the context the worse the performance; there isn't really a qualitative step change in capability (if there is imo it happens at like 8k-16k tokens, much sooner than is relevant for multi-turn coding tasks - see e.g. this old benchmark https://github.com/adobe-research/NoLiMa ). | ||