Remix clone Hacker News

new | show | ask | jobs Github

	▲	mohsen1 7 hours ago
		the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll need a much bigger hardware than a Mac Studio. $10k buys you a lot of API calls from z.ai or Anthropic. It's just not economically viable to run a good model at home.
	▲	zozbot234 6 hours ago \| parent \| next [-]
		You can cluster Mac Studios using Thunderbolt connections and enable RDMA for distributed inference. This will be slower than a single node but is still the best bang-for-the-buck wrt. doing inference on very-large-sized models.
	▲	mitjam 6 hours ago \| parent \| prev [-]
		True — I think local inference is still far more expensive for my use case due to batching effects and my relatively sporadic, hourly usage. That said, I also didn’t expect hardware prices (RTX 5090, RAM) to rise this quickly.