How are people running this locally? I just checked llama.cpp and it appears unsloth has a version but it hacks a bunch of things to make it work and isn't optimal.

https://github.com/ggml-org/llama.cpp/issues/24730

▲

jeremyjh a day ago | parent [-]

No one is doing that for a model this size it would have to be so heavily quantized that it wouldn’t be useful - or you’d need to spend a half million dollars on hardware. People use hosted APIs. Open weight means cloud vendors can host it.

▲

malshe a day ago | parent [-]

Can you recommend any US based cloud providers?

▲

maybe_pablo a day ago | parent | next [-]

In HuggingChat (https://huggingface.co/chat) you can test open models for free and even test specific providers.

From there I collected the following US providers currently serving GLM 5.2:

- Together (https://www.together.ai/models)

- Fireworks (https://fireworks.ai/models)

- Featherless (https://featherless.ai/models)

	▲	malshe a day ago \| parent [-]
		That's great. Thank you!

▲

fooster 19 hours ago | parent | prev [-]

ollama cloud, neuralwatt.

	▲	malshe 12 hours ago \| parent [-]
		Thanks