| ▲ | cmrdporcupine 6 days ago | |||||||
Use them via DeepInfra instead of z.ai. No reliability issues. https://deepinfra.com/zai-org/GLM-5.1 Looks like fp4 quantization now though? Last week was showing fp8. Hm.. | ||||||||
| ▲ | wolttam 6 days ago | parent [-] | |||||||
Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it. I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai. I really liked Deepinfra but something doesn't seem right over there at the moment. | ||||||||
| ||||||||