This, and always check benchmarks instead of assuming memory bandwidth is the only possible bottleneck. Apple Silicon definitely does not fully use its advertised memory bandwidth when running LLMs.