Remix clone Hacker News

new | show | ask | jobs Github

	▲	lxgr an hour ago
		That said, the KV cache is very much not stateless, so internally inference APIs will be highly incentivized to route requests to instances with as much a shared prefix cached as possible.