Remix clone Hacker News

new | show | ask | jobs Github

	▲	kiratp 2 hours ago
		By caching they mean “cached in GPU memory”. That’s a very very scarce resource. Caching to RAM and disk is a thing but it’s hard to keep performance up with that and it’s early days of that tech being deployed anywhere. Disclosure: work on AI at Microsoft. Above is just common industry info (see work happening in vLLM for example)