Remix.run Logo
skavi 14 days ago

i’m curious what advantage is derived from this existing independently of the PTX stack? i.e. why doesn’t cuTile produce PTX via a bundled compiler like Triton or (iirc) Warp?

Even if there is some impedance mismatch, could PTX itself not have been updated?

cavisne 14 days ago | parent [-]

In the presentation they said eventually kernels can share SIMT (PTX) and TileIR but not at launch. It seems pretty mysterious why they don't just emit PTX, I would guess they are either taking the opportunity to clean things up for ML tensorcore workloads or there is some HW specific features coming that they only want to enable through TileIR.

skavi 12 days ago | parent [-]

if i were to lean into cynicism, i might suggest this choice was meant to increase the effort required to reimplement cuda for other cards.