| ▲ | High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction(jchandra.com) | |
| 14 points by jchandra 2 days ago | 1 comments | ||
| ▲ | vivahir215 2 days ago | parent [-] | |
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency? | ||