Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D.
Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.
Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.
So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.
Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.