▲ | RossBencina 7 days ago | |
One claim from that podcast was that the xLSTM attention mechanism is (in practical implementation) more efficient than (transformer) flash attention, and therefore promises to significantly reduces the time/cost of test-time compute. | ||
▲ | korbip 7 days ago | parent [-] | |
Test it out here: |