| ▲ | grosswait 3 hours ago | |
I would like to hear more about your set up if you’re willing. Is the token aware router you’re using publicly available or something you’ve written yourself? | ||
| ▲ | nickreese an hour ago | parent [-] | |
It isn't open... but drop me an email and I can send you it. Basically just tracks a list of known lmstudios on the network, queries their models every 15 seconds and routes to the ones who have the requested models loaded in a FIFO queue tracking the number of tokens/model (my servers are uniform... m4 max 128gb studios but could also track the server) and routes to the one that has just finished. I used to have it queue one just as it was expected to finish but was facing timeout issues due to an edgecase. | ||