Remix clone Hacker News

new | show | ask | jobs Github

	▲	butILoveLife 3 hours ago
		>. I run quantized 70B models locally (M2 Max 96GB, llama.cpp + LiteLLM), and memory bandwidth is always the bottleneck. I imagine you got 96gb because you thought you'd be running models locally? Did you not know the phrase Unified Memory is marketing speak?