still looking for vLLM to support Mac ARM Metal GPUs
Yeah. The docs tell you that you should build it yourself, but…
but unlike cuda there's no custom kernels for inference in vllm repo...
I think