| ▲ | samwho 4 hours ago | |
With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now. There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way. | ||
| ▲ | moebrowne 4 hours ago | parent [-] | |
https://platform.openai.com/docs/guides/prompt-caching#requi... > Caching is available for prompts containing 1024 tokens or more. No mention of caching being in blocks of 1024 tokens thereafter. | ||