Remix clone Hacker News

new | show | ask | jobs Github

	▲	kristianp 5 hours ago
		ExLlamaV3 EXL3 2bpw is likely the 30b parameter GLM 4.7 Flash quantised down to 2 bits, the unstated assumption is that you need to check the 2bpw quantisation works well enough for your use case. The reported size of the ModelOpt FP8, 16 GB, sounds wrong to me. If its 8 bits per parameter it is going to be a similar size to the glm-4.7-flash:q8_0. They repeat this a few times in the readme.