Remix clone Hacker News

new | show | ask | jobs Github

	▲	samwho 4 hours ago
		With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now. There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way.
	▲	moebrowne 4 hours ago \| parent [-]
		https://platform.openai.com/docs/guides/prompt-caching#requi... > Caching is available for prompts containing 1024 tokens or more. No mention of caching being in blocks of 1024 tokens thereafter.