I think it could be improved a lot by niche optimization passes on the codegen backend. Kinda like the autovectorization and similar optimizations on the CPU backends.