> they scale out in 2D, we scale up in 3D.

This actually helps a lot, thanks.

> Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic

Is this done with current manufacturing technologies? Does it require a special process?

> no streaming, no off-chip memory at all. ~1 kW, not 23 kW

Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output?

I think you are asking for the Energy/token. Cerebras is 12.8J, Sophon is 25.8mJ. Three orders of difference.

	▲	binyu 9 hours ago \| parent [-]
		so Sophon is less efficient than Cerebras? Edit: is that Joule vs micro-Joule? I need better glasses > Cerebras is 12.8J, Sophon is 25.8mJ Are your figures hypothetical or do you have a working prototype?