Remix clone Hacker News

new | show | ask | jobs Github

	▲	storus 19 hours ago
		Does it support paged attention like vLLM though? Without that they will run into memory fragmentation quickly.
	▲	lukebechtel 19 hours ago \| parent \| next [-]
		Yes, great question! The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck. Pretty cool!
	▲	8 hours ago \| parent \| prev [-]
		[deleted]