Remix clone Hacker News

new | show | ask | jobs Github

	▲	discordance 5 hours ago
		Could you please share your time to first token and tok/s?
	▲	isomorphic 2 hours ago \| parent \| next [-]
		M4 Pro 64GB (14 CPU / 20 GPU), Gemma 4 31B Q4_K_M GGUF, LM Studio: time to first token 0.92s, 11.56 tokens/s. Edit: For comparison with the other poster, same setup as above, but with Gemma 4 31B Instruct 8bit MLX (not sure if exactly the same model): time to first token 4.62s, 7.20 tokens/s; with a different prompt, 1.17s and 7.24 tokens/s.
	▲	ls612 4 hours ago \| parent \| prev [-]
		I’m on an M2 Max and get 10 tok/s with Gemma 4 8bit MLX