Remix.run Logo
wmf 12 hours ago

This design is absolutely wild. It probably won't work but I admire the dream.

minkowsky 12 hours ago | parent [-]

Author here. The economy is more realistic than the wafer-scale ASIC by Cerebras.

JumpCrisscross 11 hours ago | parent | next [-]

Can you explain why?

minkowsky 11 hours ago | parent [-]

I have a detailed comparison with Cerebras in economic analysis: https://www.phantafield.com/whitepaper#7-economic-analysis

wmf 11 hours ago | parent | prev | next [-]

I'm questioning technical risks such as BEOL transistors and 2T DRAM cell structure, not the economics. Cerebras has already retired their technical risk.

minkowsky 10 hours ago | parent [-]

It's risky, like landing a rocket, but not impossible.

binyu 12 hours ago | parent | prev [-]

Hello, kudos for the tremendous work. Could you explain the difference between your design and Cerebras?

Bests

minkowsky 12 hours ago | parent [-]

Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D.

Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.

Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.

So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.

Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.

binyu 11 hours ago | parent | next [-]

> they scale out in 2D, we scale up in 3D.

This actually helps a lot, thanks.

> Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic

Is this done with current manufacturing technologies? Does it require a special process?

> no streaming, no off-chip memory at all. ~1 kW, not 23 kW

Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output?

minkowsky 10 hours ago | parent [-]

I think you are asking for the Energy/token. Cerebras is 12.8J, Sophon is 25.8mJ. Three orders of difference.

binyu 9 hours ago | parent [-]

so Sophon is less efficient than Cerebras?

Edit: is that Joule vs micro-Joule? I need better glasses

> Cerebras is 12.8J, Sophon is 25.8mJ

Are your figures hypothetical or do you have a working prototype?

matt123456789 11 hours ago | parent | prev [-]

I suspect you are being downvoted because your answer is AI-generated, but I found it very clear and will upvote.

binyu 11 hours ago | parent [-]

What makes you think his reply was AI generated?

Edit: I can see a bunch of hints, most definitely. Still a good comment though.

minkowsky 10 hours ago | parent [-]

I do use AI for some of the answers. I now know the penalty. Thank you for the heads up.