I mean, nvidia exposes some pretty low level primitives, and you can always fiddle with the PTX as deepseek did.