Remix clone Hacker News
new
|
show
|
ask
|
jobs
Github
▲
Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090
(
buraak.com
)
2 points
by
bozdemir
7 hours ago