| ▲ | kreelman an hour ago | |
Fantastic results. Well done. ...So this is built into the way the model works.. if I'm understanding it correctly. I was wondering what would be involved in getting it to work with GGUF files, rather than safetensor files... | ||
| ▲ | dot_treo an hour ago | parent [-] | |
Just to get it into a GGUF file would be fairly trivial. But using that GGUF file would need a bunch of additional things. One would need to create a new architecture derived from Qwen3, and then probably adapt the speculative decoding functionality. At the moment not even MTP is merged into llama.cpp, so I wouldn't quite hold my breath for it. | ||