Remix.run Logo
Aurornis 2 hours ago

Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done.

segmondy 2 hours ago | parent [-]

Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.