Curious what would be the most minimal reasonable hardware one would need to deploy this locally?

NitpickLawyer 8 hours ago | parent | next [-]

I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching.

You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn).

▲

simonw 6 hours ago | parent | prev | next [-]

Models of this size can usually be run using MLX on a pair of 512GB Mac Studio M3 Ultras, which are about $10,000 each so $20,000 for the pair.

▲

PlatoIsADisease an hour ago | parent [-]

You might want to clarify that this is more of a "Look it technically works"

Not a "I actually use this"

The difference between waiting 20 minutes to answer the prompt '1+1='

and actually using it for something useful is massive here. I wonder where this idea of running AI on CPU comes from. Was it Apple astroturfing? Was it Apple fanboys? I don't see people wasting time on non-Apple CPUs. (Although, I did do this for a 7B model)

	▲	tucnak 15 minutes ago \| parent [-]
		Mac studio way is not "AI on CPU," as M2/M4 are complex SoC, that includes a GPU with unified memory access.

▲

tosh 6 hours ago | parent | prev [-]

I think you can put a bunch of apple silicon macs with enough ram together

e.g. in an office or coworking space

800-1000 gb ram perhaps?