▲ | ilaksh 4 hours ago | |
Cerebras uses SRAM integrated into a giant chip I think. It is extremely fast inference -- they say 70 X faster than GPU clouds, over 2000 tokens per second output of a 70b model. But still uses a ton of energy as far as I know. And the chips are, I assume, expensive to produce. Memristors might work, to get the next 10 X or 100 X in efficiency from where Cerebras is. As far as more complex neurons, I was thinking that if each unit was on a similar order of magnitude in size but somehow could do more work, then that could be more efficient. |