Remix.run Logo
root_axis 6 hours ago

> new hardware runs $4k-$10k last I checked

Starting closer to 40k if you want something that's practical. 10k can't run anything worthwhile for SDLC at useful speeds.

zozbot234 6 hours ago | parent [-]

$10K should be enough to pay for a 512GB RAM machine which in combination with partial SSD offload for the remaining memory requirements should be able to run SOTA models like DS4-Pro or Kimi 2.6 at workable speed. It depends whether MoE weights have enough locality over time that the SSD offload part is ultimately a minor factor.

(If you are willing to let the machine work mostly overnight/unattended, with only incidental and sporadic human intervention, you could even decrease that memory requirement a bit.)

SwellJoe 5 hours ago | parent [-]

You can't put "SSD offload" and "workable speed" in the same sentence.

zozbot234 4 hours ago | parent [-]

As a typical example DeepSeek v4-pro has 59B active params at mostly FP4 size, so it needs to "find" around 30GB worth of params in RAM per inferred token. On a 512GB total RAM machine, most of those params will actually be cached in RAM (model size on disk is around 862GB), so assuming for the sake of argument that MoE expert selection is completely random and unpredictable, around 15GB in total have to be fetched from storage per token. If MoE selection is not completely random and there's enough locality, that figure actually improves quite a bit and inference becomes quite workable.