Remix clone Hacker News

new | show | ask | jobs Github

	▲	oceanplexian 3 hours ago
		Doing inference with a Mac Mini to save money is more or less holding it wrong. Of course if you buy some overpriced Apple hardware it’s going to take years to break even. Buy a couple real GPUs and do tensor parallelism and concurrent batch requests with vllm and it becomes extremely cost competitive to run your own hardware.
	▲	mythz 3 hours ago \| parent [-]
		> Doing inference with a Mac Mini to save money is more or less holding it wrong. No one's running these large models on a Mac Mini. > Of course if you buy some overpriced Apple hardware it’s going to take years to break even. Great, where can I find cheaper hardware that can run GLM 5's 745B or Kimi K2.5 1T models? Currently it requires 2x M3 Ultras (1TB VRAM) to run Kimi K2.5 at 24 tok/s [1] What are the better value alternatives? [1] https://x.com/alexocheema/status/2016404573917683754