Remix.run Logo
Chamix 3 hours ago

Try 10s of trillions. These days everyone is running 4-bit at inference (the flagship feature of Blackwell+), with the big flagship models running on recently installed Nvidia 72gpu rubin clusters (and equivalent-ish world size for those rented Ironwood TPUs Anthropic also uses). Let's see, Vera Rubin racks come standard with 20 TB (Blackwell NVL72 with 10 TB) of unified memory, and NVFP4 fits 2 parameters per btye...

Of course, intense sparsification via MoE (and other techniques ;) ) lets total model size largely decouple from inference speed and cost (within the limit of world size via NVlink/TPU torrus caps)

So the real mystery, as always, is the actual parameter count of the activated head(s). You can do various speed benchmarks and TPS tracking across likely hardware fleets, and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)

Comparing Opus 4.6 or GPT 5.4 thinking or Gemini 3.1 pro to any sort Chinese model (on cost) is just totally disingenuous when China does NOT have Vera Rubin NVL72 GPUs or Ironwood V7 TPUs in any meaningful capacity, and is forced to target 8gpu Blackwell systems (and worse!) for deployment.

jychang an hour ago | parent | next [-]

Nobody is running 10s of trillion param models in 2026. That's ridiculous.

Opus is 2T-3T in size at most.

aurareturn 2 hours ago | parent | prev [-]

China is targeting H20 because that's all they were officially allowed to buy.

Chamix 2 hours ago | parent [-]

I generally agree, back of the napkin math shows H20 cluster of 8gpu * 96gb = 768gb = 768B parameters on FP8 (no NVFP4 on Hopper), which lines up pretty nicely with the sizes of recent open source Chinese models.

However, I'd say its relatively well assumed in realpolitik land that Chinese labs managed to acquire plenty of H100/200 clusters and even meaningful numbers of B200 systems semi-illicitly before the regulations and anti-smuggling measures really started to crack down.

This does somewhat beg the question of how nicely the closed source variants, of undisclosed parameter counts, fit within the 1.1tb of H200 or 1.5tb of B200 systems.

aurareturn 2 hours ago | parent [-]

They do not have enough H200 or Blackwell systems to server 1.6 billion people and the world so I doubt it's in any meaningful number.

Chamix 2 hours ago | parent [-]

I assure you, the number of people paying to use Qwen3-Max or other similar proprietary endpoints is far less than 1.6 billion.

aurareturn an hour ago | parent [-]

You don't need to assure me. It's a theoretical maximum.