| ▲ | aetherspawn 4 hours ago | ||||||||||||||||||||||
I wish they would release the requirements to run on llama.cpp with any announcements of open models. A bonus would be tok/s on common hardware. | |||||||||||||||||||||||
| ▲ | lcampbell 3 hours ago | parent [-] | ||||||||||||||||||||||
I don't think llama.cpp supports any of the LongCat models, actually. They haven't posted weights/inference solutions for LongCat-2.0 [1], but LongCat-Next had transformers support, which I assume means it works with vLLM/SGLang. Given it's 1.6T, "common hardware" is probably out of the question; even 2bpw is going to measure out at 400GB, even before considering the bandwidth requirements for 48B active. I haven't read the LongCat-2.0 architecture docs, but if you're not running GLM-5.2, you're probably not running this either. [1] https://huggingface.co/meituan-longcat/LongCat-2.0: "Model weights coming soon — stay tuned!" | |||||||||||||||||||||||
| |||||||||||||||||||||||