Can I Buy Your KV Cache?

The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache.

There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation.

▲

Eridrus an hour ago | parent | next [-]

The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.

▲

dgellow an hour ago | parent | prev [-]

Just curious, do you have links to read more about transformations or other techniques for KV cache reuse?

▲

evrydayhustling an hour ago | parent [-]

All major model providers offer prefix caching, which is this.

▲

lumost 35 minutes ago | parent [-]

No, reusing segments of the kv cache for different purposes in an order independent manner is an active research area.

	▲	dgellow 25 minutes ago \| parent [-]
		Any keyword or paper I can search for?

▲

mistercow an hour ago | parent | prev | next [-]

> Then the part that matters: where the KV lives

When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper.

▲

TuringNYC 15 minutes ago | parent | prev | next [-]

Seems Cloudflare is now doing this for scraping, so makes sense to continue down the pipeline!

▲

refulgentis 14 minutes ago | parent | prev | next [-]

This paper doesn't make any sense - for background, I've maintained an AI client that's cross-platform, cross-provider, and integrates llama.cpp since 2022. I don't know why they think "agents" don't share prefill work - paid providers cache on the prefill text, llama.cpp, same, and I specifically hooked up llama.cpp so it can do subsets as well. i.e. all the agents would reuse the cache

It reads like it started from an underspecification of "agents" x a strain of pop-wisdom about "KV cache" that I've seen enter mainstream discourse over the past 3 months that is Not Even Wrong, then, they solved a non-existent problem.

EDIT: based on the rest of comments either requesting a primer on terms, or, pointing out it makes errors in even more obvious ways, flagging.

	▲	christianqchung 6 minutes ago \| parent [-]
		I don't think Luoyuan Zhang is necessarily doing this, but I'm pretty sure lots of people are using arxiv as a glorified blog and hoping no one notices.

▲

tonetegeatinst an hour ago | parent | prev | next [-]

Does anyone have a good recommendation for explaining or as a primer on KV cache?

	▲	plutomeetsyou 40 minutes ago \| parent [-]
		convert this question to KV cache and give it to your agent

▲

sghiassy an hour ago | parent | prev | next [-]

A truly global singleton

▲

root-parent 2 hours ago | parent | prev [-]

Lambda computing for prompts?