Remix clone Hacker News

new | show | ask | jobs Github

	▲	jzig 8 hours ago
		At what point along the 1M window does context become "long" enough that this degradation occurs?
	▲	daemonologist 8 hours ago \| parent [-]
		The benchmark GP mentioned is measuring at 128k-256k context (there's another at 524k-1024k, where 4.6 scored 78.3% and 4.7 scored 32.2%). The longer the context the worse the performance; there isn't really a qualitative step change in capability (if there is imo it happens at like 8k-16k tokens, much sooner than is relevant for multi-turn coding tasks - see e.g. this old benchmark https://github.com/adobe-research/NoLiMa ).