Go-LLM-proxy – Lightweight LLM aggregator (vLLM, Llama-server)

yatesdr 10 hours ago | parent | next [-]

I run a few different models on my compute nodes and was constantly editing json files managing configs for which one was where. Built this to solve the problem of aggregating them into one place behind a public nginx reverse proxy. My goal was hooking it to claude-code or qwen when I run out of tokens so I could use minimax or glm-5, but it works great for that and also sharing those with other people.

MIT licensed, reasonably secure, maybe useful.

▲

TZubiri 10 hours ago | parent | prev [-]

So, like litellm?

	▲	yatesdr 9 hours ago \| parent [-]
		Pretty similar to litellm[proxy], but supports the Responses API and also some re-write. This is pretty much targeted at coding TUIs but I do use it a lot for text embeddings and streaming inference in applications too.