Remix.run Logo
walrus01 3 hours ago

People thinking to self-host Kimi K2.6 had better be prepared for how big it is.

Q8 K XL quantization for instance is around 600GB on disk. I would bet about 700GB of VRAM needed.

Quantizations lower than Q8 are probably worthless for quality.

Or 2.05TB on disk for the full precision GGUF.

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

If you can afford the hardware to run Kimi K2.6 at any decent speed for more than 1 simultaneous user, you probably have a whole team of people on staff who are already very familiar with how to benchmark it vs Claude, GPT-5.5, etc.

zozbot234 3 hours ago | parent [-]

Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

CamperBob2 3 hours ago | parent | next [-]

So, realistically, $100K for an 8x RTX 6000 Pro system that can run it at a usable rate.

zozbot234 3 hours ago | parent [-]

I think people will always disagree on what qualifies as a "usable rate". But keep in mind that practically no one sensible is running the latest Opus or GPT around the clock, especially not at sustainable, unsubsidized prices. With open-weights models it's easy to do that.

walrus01 2 hours ago | parent [-]

Also for people doing something medical, privacy or sensitive data related, there's an almost incalculable value (depending on industry niche) in having absolutely no external network traffic to any servers/systems you don't fully control.

walrus01 2 hours ago | parent | prev [-]

the 'unsloth' link above is a 3rd party person that has quantized it to Q8, the original release is considerably larger in size than 600GB:

https://huggingface.co/moonshotai/Kimi-K2.6

zozbot234 2 hours ago | parent [-]

That page mentions that the model is natively INT4 for most of the params, and 600GB is in the ballpark of what's available there for download.