| ▲ | visarga a day ago | |||||||||||||||||||||||||
Macbook M5 64GB - can run gemma-4-26b-a4b-it-4bit and Qwen3.6-35B-A3B-4bit at about 1500 tps prefix and 45 tps decode on contexts up to 100K tokens using MLX. It's faster than Claude. I was really surprised, chat quality is also similar to Claude for gemma4. Agentic works but does not compare to cloud models, you can still make agents where top level is code. | ||||||||||||||||||||||||||
| ▲ | mzubairtahir a day ago | parent [-] | |||||||||||||||||||||||||
sorry but asking again: how much memory is actually useable by gpu in macbook? as it is shared(os and apps also have to use same memory)? and it is different than dedicated gpu memory? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||