Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

CamperBob2 3 hours ago | parent | next [-]

So, realistically, $100K for an 8x RTX 6000 Pro system that can run it at a usable rate.

I think people will always disagree on what qualifies as a "usable rate". But keep in mind that practically no one sensible is running the latest Opus or GPT around the clock, especially not at sustainable, unsubsidized prices. With open-weights models it's easy to do that.

	▲	walrus01 2 hours ago \| parent [-]
		Also for people doing something medical, privacy or sensitive data related, there's an almost incalculable value (depending on industry niche) in having absolutely no external network traffic to any servers/systems you don't fully control.

▲

walrus01 2 hours ago | parent | prev [-]

the 'unsloth' link above is a 3rd party person that has quantized it to Q8, the original release is considerably larger in size than 600GB:

https://huggingface.co/moonshotai/Kimi-K2.6

	▲	zozbot234 2 hours ago \| parent [-]
		That page mentions that the model is natively INT4 for most of the params, and 600GB is in the ballpark of what's available there for download.