Remix.run Logo
thomasjb 3 hours ago

Unfortunately there's no gguf quants of the assistant model yet: https://huggingface.co/models?other=base_model:quantized:goo...

kristjansson 3 hours ago | parent [-]

I think MTP Gemma4 support is still WIP https://github.com/ggml-org/llama.cpp/pull/23398 ?

dofm 3 hours ago | parent | next [-]

This has been my impression.

The underlying LiteRT-LM framework used in the edge gallery does support the MTP drafters for the smaller models, but according to:

https://developers.google.com/edge/litert-lm/models/gemma-4

> Note: LiteRT-LM supports E2B and E4B models today, with support for larger models coming soon.

So even Google aren't shipping MTP support for the 26B and 31B models yet.

thot_experiment 3 hours ago | parent | prev [-]

[dead]