Remix clone Hacker News

new | show | ask | jobs Github

	▲	est 6 hours ago
		I really want to know what does M, K, XL XS mean in this context and how to choose. I searched all unsloth doc and there seems no explaination at all.
	▲	tredre3 3 hours ago \| parent \| next [-]
		Q4_K is a type of quantization. It means that all weights will be at a minimum 4bits using the K method. But if you're willing to give more bits to only certain important weights, you get to preserve a lot more quality for not that much more space. The S/M/L/XL is what tells you how many tensors get to use more bits. The difference between S and M is generally noticeable (on benchmarks). The difference between M and L/XL is less so, let alone in real use (ymmv). Here's an example of the contents of a Q4_K_: S llama_model_loader: - type f32: 392 tensors llama_model_loader: - type q4_K: 136 tensors llama_model_loader: - type q5_0: 43 tensors llama_model_loader: - type q5_1: 17 tensors llama_model_loader: - type q6_K: 15 tensors llama_model_loader: - type q8_0: 55 tensors M llama_model_loader: - type f32: 392 tensors llama_model_loader: - type q4_K: 106 tensors llama_model_loader: - type q5_0: 32 tensors llama_model_loader: - type q5_K: 30 tensors llama_model_loader: - type q6_K: 15 tensors llama_model_loader: - type q8_0: 83 tensors L llama_model_loader: - type f32: 392 tensors llama_model_loader: - type q4_K: 106 tensors llama_model_loader: - type q5_0: 32 tensors llama_model_loader: - type q5_K: 30 tensors llama_model_loader: - type q6_K: 14 tensors llama_model_loader: - type q8_0: 84 tensors
	▲	huydotnet 6 hours ago \| parent \| prev \| next [-]
		They are different quantization types, you can read more here https://huggingface.co/docs/hub/gguf#quantization-types
	▲	arcanemachiner 32 minutes ago \| parent \| prev [-]
		Just start with q4_k_m and figure out the rest later.