Remix clone Hacker News

new | show | ask | jobs Github

	▲	zargon an hour ago
		Qwen3.5 series is a little bit of an exception to the general rule here. It is incredibly kv cache size efficient. I think the max context (262k) fits in 3GB at q8 iirc. I prefer to keep the cache at full precision though.