Kimi is the latest model that isn't running correctly on AMD. Apparently close to Deepseek in design, but different enough that it just doesn't work.

It isn't just the model, it is the engine to run it. From what I understand this model works with sglang, but not with vLLM.

▲

suprjami 5 minutes ago | parent [-]

This is normal. An inference engine needs support for a model's particular implementation of the transformer architecture. This has been true for almost every model release since we got local weights.

Really good model providers send a launch-day patch to llama.cpp and vllm to make sure people can run their model instantly.

	▲	latchkey a few seconds ago \| parent [-]
		It isn't about normal or not. It is that those patches are done for Nvidia, but not AMD. It is that it takes time and energy to vet them and merge them into those projects. Kimi has been out for 3 months now and it still doesn't run out of the box on vLLM on AMD, but it works just fine with Nvidia.