Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.