| ▲ | butILoveLife 3 hours ago | |
>. I run quantized 70B models locally (M2 Max 96GB, llama.cpp + LiteLLM), and memory bandwidth is always the bottleneck. I imagine you got 96gb because you thought you'd be running models locally? Did you not know the phrase Unified Memory is marketing speak? | ||