| ▲ | zmmmmm 8 hours ago | ||||||||||||||||
Curious what would be the most minimal reasonable hardware one would need to deploy this locally? | |||||||||||||||||
| ▲ | NitpickLawyer 8 hours ago | parent | next [-] | ||||||||||||||||
I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching. You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn). | |||||||||||||||||
| ▲ | simonw 6 hours ago | parent | prev | next [-] | ||||||||||||||||
Models of this size can usually be run using MLX on a pair of 512GB Mac Studio M3 Ultras, which are about $10,000 each so $20,000 for the pair. | |||||||||||||||||
| |||||||||||||||||
| ▲ | tosh 6 hours ago | parent | prev [-] | ||||||||||||||||
I think you can put a bunch of apple silicon macs with enough ram together e.g. in an office or coworking space 800-1000 gb ram perhaps? | |||||||||||||||||