DFlash immediately came to my mind.
There are several Mac implementations of it that show > 2x faster Qwen3.5 already.