| ▲ | minkowsky 12 hours ago | ||||||||||||||||
Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D. Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed. Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW. So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity. Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach. | |||||||||||||||||
| ▲ | binyu 11 hours ago | parent | next [-] | ||||||||||||||||
> they scale out in 2D, we scale up in 3D. This actually helps a lot, thanks. > Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic Is this done with current manufacturing technologies? Does it require a special process? > no streaming, no off-chip memory at all. ~1 kW, not 23 kW Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output? | |||||||||||||||||
| |||||||||||||||||
| ▲ | matt123456789 11 hours ago | parent | prev [-] | ||||||||||||||||
I suspect you are being downvoted because your answer is AI-generated, but I found it very clear and will upvote. | |||||||||||||||||
| |||||||||||||||||