▲ | spott 5 days ago | |
vLLM is a LLM model serving framework written using raw PyTorch. ONNX doesn’t support a bunch of operations that PyTorch does (it isn’t always possible to convert a PyTorch model to ONNX). Torchserve runs raw PyTorch. Generally speaking, PyTorch is pretty well optimized. For Mac it has been historically ignored, so the kernels for MPS were all missing or just bad, but on CUDA and Linux they are pretty good. |