| ▲ | lumost 3 hours ago | ||||||||||||||||||||||||||||||||||
The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache. There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation. | |||||||||||||||||||||||||||||||||||
| ▲ | xg15 40 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Isn't it also, most fundamentally, dependent on the model weights? My understanding was that what the KV cache stores is nothing else than the "activations" of the W_k and W_v matrices of an attention module for a given input sequence. So I don't quite understand how this is supposed to work: > Let a publisher precompute a document's KV cache, and let every other agent buy the right to load it and skip prefill. Should a publisher precompute the cache for every popular model that is out there? | |||||||||||||||||||||||||||||||||||
| ▲ | Eridrus 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title. | |||||||||||||||||||||||||||||||||||
| ▲ | dgellow 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Just curious, do you have links to read more about transformations or other techniques for KV cache reuse? | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | TZubiri an hour ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Absolute slop paper. Replace document with text and you'll get it. "People are asking the same questions and an answer is generated every time, what if we could like cache the questions and their answers..." Sounds like someone was using chatgpt to understand how chatgpt works and then asked it to generate a paper based on his proposal to improve it. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||