Remix clone Hacker News

new | show | ask | jobs Github

	▲	Aurornis 2 hours ago
		Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done.
	▲	segmondy 2 hours ago \| parent [-]
		Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.