manually writing tiling logic for systolic arrays is the absolute worst. if this actually works it saves me so much headache.