Remix.run Logo
davidhyde 2 hours ago

vLLM needs to perform similar operations to an operating system. If you write an operating system in Python you will have scope for many 40% improvements all over the place and in the end it won’t be Python anymore, at least under the hood it won’t be.

menaerus an hour ago | parent [-]

It's not about the python at all. Optimization techniques are on a completely different level, on the level of the chip and/or hw platform and finding ways to utilize them in a max manner by exploiting the intrinsic details about their limitations.