Remix.run Logo
kristianp 5 hours ago

ExLlamaV3 EXL3 2bpw is likely the 30b parameter GLM 4.7 Flash quantised down to 2 bits, the unstated assumption is that you need to check the 2bpw quantisation works well enough for your use case.

The reported size of the ModelOpt FP8, 16 GB, sounds wrong to me. If its 8 bits per parameter it is going to be a similar size to the glm-4.7-flash:q8_0. They repeat this a few times in the readme.