Remix clone Hacker News

new | show | ask | jobs Github

	▲	grosswait 3 hours ago
		I would like to hear more about your set up if you’re willing. Is the token aware router you’re using publicly available or something you’ve written yourself?
	▲	nickreese an hour ago \| parent [-]
		It isn't open... but drop me an email and I can send you it. Basically just tracks a list of known lmstudios on the network, queries their models every 15 seconds and routes to the ones who have the requested models loaded in a FIFO queue tracking the number of tokens/model (my servers are uniform... m4 max 128gb studios but could also track the server) and routes to the one that has just finished. I used to have it queue one just as it was expected to finish but was facing timeout issues due to an edgecase.