Author here. Happy to answer any deep-dive questions about the CUDA implementation or the Kronecker factorization math.