▲ | daveguy 3 days ago | |
Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications! | ||
▲ | cubefox 3 days ago | parent [-] | |
They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity. |