Remix clone Hacker News

new | show | ask | jobs Github

	▲	kennethops 10 hours ago
		do you know if they did this to it? https://research.google/blog/turboquant-redefining-ai-effici...
	▲	kgeist 10 hours ago \| parent [-]
		Llama.cpp already uses an idea from it internally for the KV cache [0] So a quantized KV cache now must see less degradation [0] https://github.com/ggml-org/llama.cpp/pull/21038