▲ | RossBencina 5 months ago | |
One claim from that podcast was that the xLSTM attention mechanism is (in practical implementation) more efficient than (transformer) flash attention, and therefore promises to significantly reduces the time/cost of test-time compute. | ||
▲ | korbip 5 months ago | parent [-] | |
Test it out here: |