Remix.run Logo
kiratp 2 hours ago

By caching they mean “cached in GPU memory”. That’s a very very scarce resource.

Caching to RAM and disk is a thing but it’s hard to keep performance up with that and it’s early days of that tech being deployed anywhere.

Disclosure: work on AI at Microsoft. Above is just common industry info (see work happening in vLLM for example)