▲ | philipkiely 8 days ago | |
TRT-LLM has its challenges from a DX perspective and yeah for Multi-modal we still use vLLM pretty often. But for the kind of traffic we are trying to serve -- high volume and latency sensitive -- it consistently wins head-to-head in our benchmarking and we have invested a ton of dev work in the tooling around it. |