| ▲ | discordance 5 hours ago | |
Could you please share your time to first token and tok/s? | ||
| ▲ | isomorphic 2 hours ago | parent | next [-] | |
M4 Pro 64GB (14 CPU / 20 GPU), Gemma 4 31B Q4_K_M GGUF, LM Studio: time to first token 0.92s, 11.56 tokens/s. Edit: For comparison with the other poster, same setup as above, but with Gemma 4 31B Instruct 8bit MLX (not sure if exactly the same model): time to first token 4.62s, 7.20 tokens/s; with a different prompt, 1.17s and 7.24 tokens/s. | ||
| ▲ | ls612 4 hours ago | parent | prev [-] | |
I’m on an M2 Max and get 10 tok/s with Gemma 4 8bit MLX | ||