Remix clone Hacker News

new | show | ask | jobs Github

	▲	echion 2 hours ago
		> you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part Are you doing this with vLLM, or some other model-running library/setup?
	▲	coder543 2 hours ago \| parent [-]
		They're probably referencing this article: https://blog.exolabs.net/nvidia-dgx-spark/