Remix clone Hacker News

new | show | ask | jobs Github

	▲	jerjerjer 5 months ago
		Is there a benchmark to measure real effective context length? Sure, gpt-4o has a context window of 128k, but it loses a lot from the beginning/middle.
	▲	brookst 5 months ago \| parent \| next [-]
		Here's an older study that includes Claude 3.5: https://www.databricks.com/blog/long-context-rag-capabilitie...?
	▲	evertedsphere 5 months ago \| parent \| prev \| next [-]
		ruler https://arxiv.org/abs/2404.06654 nolima https://arxiv.org/abs/2502.05167
	▲	bigmadshoe 5 months ago \| parent \| prev [-]
		They often publish "needle in a haystack" benchmarks that look very good, but my subjective experience with a large context is always bad. Maybe we need better benchmarks.