I had discounted Edge Gallery because it didn't support system prompts, but now it does so I will give it another go. I believe the implementation does use MTP since I got an update to Gemma-4-E4B on iOS indicating such, and on macOS it's very speedy.

However, on my 18GB RAM MacBook Pro, selecting Gemma-4-12B-it results in this error:

> The model "Gemma-4-12B-it' requires more memory (RAM) than is available on your device.

So yeah, my questions about the 16GB marketing copy are fair.

▲ dofm 8 days ago | parent | next [-]

I wonder if they were just slightly ahead of this announcement?

https://blog.google/innovation-and-ai/technology/developers-...

Looks like the 12B model should fit now?

▲ minimaxir 7 days ago | parent [-]

It definitely works in LM Studio, not Edge Gallery yet.

	▲	dofm 6 days ago \| parent [-]
		Following up again, in case you see this. I was just in oMLX trying to set up the new 26B QAT models with MTP, and I noticed this message: `Kernel iogpu.wired_limit_mb is only 48.0 GB; oMLX can only allocate up to 48.0 GB. Raise it in Terminal: sudo sysctl iogpu.wired_limit_mb=59392` Perhaps if you can increase the wired limit it will fit?

▲ dofm 10 days ago | parent | prev [-]

Interesting; they may have fluffed up somewhere then.

(Though perhaps it'll squeeze in with a small context window? Not sure I understand that aspect yet)

It does seem to use MTP, yes, and it is quite quick — seemingly the underlying LiteRT stuff can do MTP with Gemma 4 and presumably MTP is a big part of the practicality picture here.

The system prompt thing was a surprise when I poked around.