| ▲ | Can I Buy Your KV Cache?(arxiv.org) | ||||||||||||||||||||||||||||||||||||||||
| 23 points by MediaSquirrel 2 hours ago | 14 comments | |||||||||||||||||||||||||||||||||||||||||
| ▲ | lumost an hour ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache. There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | mistercow an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
> Then the part that matters: where the KV lives When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | TuringNYC 15 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Seems Cloudflare is now doing this for scraping, so makes sense to continue down the pipeline! | |||||||||||||||||||||||||||||||||||||||||
| ▲ | refulgentis 14 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
This paper doesn't make any sense - for background, I've maintained an AI client that's cross-platform, cross-provider, and integrates llama.cpp since 2022. I don't know why they think "agents" don't share prefill work - paid providers cache on the prefill text, llama.cpp, same, and I specifically hooked up llama.cpp so it can do subsets as well. i.e. all the agents would reuse the cache It reads like it started from an underspecification of "agents" x a strain of pop-wisdom about "KV cache" that I've seen enter mainstream discourse over the past 3 months that is Not Even Wrong, then, they solved a non-existent problem. EDIT: based on the rest of comments either requesting a primer on terms, or, pointing out it makes errors in even more obvious ways, flagging. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | tonetegeatinst an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Does anyone have a good recommendation for explaining or as a primer on KV cache? | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | sghiassy an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
A truly global singleton | |||||||||||||||||||||||||||||||||||||||||
| ▲ | root-parent 2 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
Lambda computing for prompts? | |||||||||||||||||||||||||||||||||||||||||