Remix.run Logo
dot_treo an hour ago

Just to get it into a GGUF file would be fairly trivial. But using that GGUF file would need a bunch of additional things. One would need to create a new architecture derived from Qwen3, and then probably adapt the speculative decoding functionality.

At the moment not even MTP is merged into llama.cpp, so I wouldn't quite hold my breath for it.