Remix.run Logo
bitmasher9 6 hours ago

Nvidia will sell you an entire server rack ready for inference. Or maybe you can roll out your own Blackwell based system.

We’re approaching a world where running a primer frontier model is possible on a workstation, probably will have something under $30k that looks like a desktop for Nvidia’s next generation. It sounds expensive, until you look at your Anthropic bill.

It’s similar unit economics as could computing for the open models. You can save a ton on the expenses by buying the hardware, but it requires a lot of in-house expertise, and you get the most value if you keep the system operating around the clock. The big kink is open models are usually 2 quarters behind frontier, and your competitors are probably trying to get access to mythos.

program_whiz 4 hours ago | parent [-]

"approaching" is doing some work there. $30K today will get you 90-144GB usable VRAM with solid system RAM and disk and CPU. A single B200 chip at 180GB is $40K. Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM (8+ H200/B200), and then 1M context KV cache is many more GBs on top of that.

That's a $500K-$1M+ rig as of now. That's a lot of $200 subscriptions to break even, but reasonable if you are paying Anthropic $25/M tokens. Then of course there's the power, cooling, and maintenance to consider...

But yeah, I can see if the prices come down 10x in a few years, or crater after the bubble, $30-40k might get you a decent machine.

zozbot234 2 hours ago | parent [-]

> Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM

You don't have to run a model from VRAM, or even from a sizeable amount of RAM. These choices only ever make sense when serving the model at scale, to hundreds of simultaneous users or more.

bitmasher9 an hour ago | parent [-]

For workstation inference a unified memory architecture would be a good cost/performance balance, while keeping COGs reasonable.

512GB unified memory macs are available, with the ram upgrade costing a few grand.