| ▲ | jerjerjer 5 months ago | |
Is there a benchmark to measure real effective context length? Sure, gpt-4o has a context window of 128k, but it loses a lot from the beginning/middle. | ||
| ▲ | brookst 5 months ago | parent | next [-] | |
Here's an older study that includes Claude 3.5: https://www.databricks.com/blog/long-context-rag-capabilitie...? | ||
| ▲ | evertedsphere 5 months ago | parent | prev | next [-] | |
| ▲ | bigmadshoe 5 months ago | parent | prev [-] | |
They often publish "needle in a haystack" benchmarks that look very good, but my subjective experience with a large context is always bad. Maybe we need better benchmarks. | ||