Remix clone Hacker News

new | show | ask | jobs Github

	▲	Bombthecat 16 hours ago
		You still need to hold the model in memory. If you have for example 16 GB ram, the gains aren't that much
	▲	anon373839 15 hours ago \| parent [-]
		That's not what consumes the most memory at scale. The KV caches are per-user.