High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

	▲	High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction(jchandra.com)
		14 points by jchandra 2 days ago \| 1 comments

	▲	vivahir215 2 days ago \| parent [-]
		Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?