Remix.run Logo
mohsen1 7 hours ago

the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll need a much bigger hardware than a Mac Studio. $10k buys you a lot of API calls from z.ai or Anthropic. It's just not economically viable to run a good model at home.

zozbot234 6 hours ago | parent | next [-]

You can cluster Mac Studios using Thunderbolt connections and enable RDMA for distributed inference. This will be slower than a single node but is still the best bang-for-the-buck wrt. doing inference on very-large-sized models.

mitjam 6 hours ago | parent | prev [-]

True — I think local inference is still far more expensive for my use case due to batching effects and my relatively sporadic, hourly usage. That said, I also didn’t expect hardware prices (RTX 5090, RAM) to rise this quickly.