▲ | nserrino 6 days ago | |
Hey, thanks for the thoughtful comments. A lot of big claims have been made in this area so skepticism is the right default reaction. tl;dr: agree that we should provide the kernels and benchmark suite so this can be evaluated by others, will follow up with that. A few clarifications: 1. Baselines - We didn't compare to torch.compile because as of PyTorch 2.7, torch.compile doesn't support the MPS backend, and we ran into some issues on many of the problems when using it. GitHub issue: https://github.com/pytorch/pytorch/issues/150121. Once it's supported, it will be the obvious baseline. 2. Methodology - We followed KernelBench’s protocol to establish a baseline on Metal, adding more correctness checks. Warmup and synchronization were done. We recognize the limitations here and are expanding the validation suite. 3. Optimizations - Right now most of the optimizations are fusions, but there is some use of Metal-specific primitives/optimizations. We expect as we make the supervisor more sophisticated, the novelty of the optimized kernels will also increase. Overall the goal here is to get some % of the benefit of a human expert in kernel engineering, without developer effort. Compiler-based optimizations are great, but hand-tuned implementations are still common for performance-critical models. The hope is that we can automate some of that process. |