Remix clone Hacker News

new | show | ask | jobs Github

	▲	ben_s 3 days ago
		Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.