Remix.run Logo
daveguy 3 days ago

Well, then none of their model's numbers would be bold and that's not what they/AIs usually see in publications!

cubefox 3 days ago | parent [-]

They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.