cuTile is basically Nvidia’s Triton (no, not that Triton, OpenAI’s Triton) competitor. It takes your Python code and generates kernels at runtime. CUTLASS has a new Python interface that does the same thing.