Remix.run Logo
ilaksh 4 hours ago

Cerebras uses SRAM integrated into a giant chip I think. It is extremely fast inference -- they say 70 X faster than GPU clouds, over 2000 tokens per second output of a 70b model. But still uses a ton of energy as far as I know. And the chips are, I assume, expensive to produce.

Memristors might work, to get the next 10 X or 100 X in efficiency from where Cerebras is.

As far as more complex neurons, I was thinking that if each unit was on a similar order of magnitude in size but somehow could do more work, then that could be more efficient.