Remix clone Hacker News

new | show | ask | jobs Github

	▲	kingstnap 6 hours ago
		Impressive performance work. It's interesting that you still see these 40+% perf gains like this. Makes you think that you will continue to see the costs for a fixed level of "intelligence" dropping.
	▲	davidhyde 17 minutes ago \| parent \| next [-]
		vLLM needs to perform similar operations to an operating system. If you write an operating system in Python you will have scope for many 40% improvements all over the place and in the end it won’t be Python anymore, at least under the hood it won’t be.
	▲	whoevercares 6 hours ago \| parent \| prev [-]
		Absolutely. LLM inference is still a greenfield — things like overlap scheduling and JIT CUDA kernels are very recent. We’re just getting started optimizing for modern LLM architectures, so cost/perf will keep improving fast.