Remix.run Logo
Bombthecat 16 hours ago

You still need to hold the model in memory. If you have for example 16 GB ram, the gains aren't that much

anon373839 15 hours ago | parent [-]

That's not what consumes the most memory at scale. The KV caches are per-user.