Remix clone Hacker News

new | show | ask | jobs Github

	▲	kreelman an hour ago
		Fantastic results. Well done. ...So this is built into the way the model works.. if I'm understanding it correctly. I was wondering what would be involved in getting it to work with GGUF files, rather than safetensor files...
	▲	dot_treo an hour ago \| parent [-]
		Just to get it into a GGUF file would be fairly trivial. But using that GGUF file would need a bunch of additional things. One would need to create a new architecture derived from Qwen3, and then probably adapt the speculative decoding functionality. At the moment not even MTP is merged into llama.cpp, so I wouldn't quite hold my breath for it.