They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.