Remix clone Hacker News

new | show | ask | jobs Github

	▲	pferdone an hour ago
		I can see that and I don't know your setup, but there are people pushing >70t/s with MTP on a single 3090, with big contexts still >50t/s. 64k is not a lot for agentic coding, and IIRC 128k with turboquant and the likes should be possible for you. r/LocalLLM/ and r/LocalLLaMA/ are worth a visit IMO. EDIT: just found this recipe repo, may wanna give it a go: https://github.com/noonghunna/club-3090 EDIT-2: this can also shave off a lot of context need for tool calling -> https://github.com/rtk-ai/rtk
	▲	gchamonlive 36 minutes ago \| parent [-]
		will give more info in the post EDIT: thanks for the links!