It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!