I feel like this is tangential to this conversation.

Does anyone know of a good tool for "load balancing" usage across local GPUs?

Why: I have two RTX3090s (24GB). I've been using nvidia-smi to check usage of my RTX3090. Mostly I'm running llama.cpp with unsloth/Qwen3.6-27B-GGUF:Q4_K_M and getting some pretty decent results for a self hosted LLMs (orchestrated via opencode). I'm surprised at how well it is working for a local model. nvidia-smi is great for determining total VRAM usage and nvtop gives a little more insight.

But, I also am doing some experiments with some other non-LLM models (video generation, etc), and want to find a way to timeslice across these GPUs, for example, when my coding is paused.

This "Utilyze" tool appears it would get me better insight into usage of one. Can it be scripted to better utilize my GPUs across a diverse load?

Any suggestions on whether there are existing projects out there? I thought about vibe coding, but wonder if there is existing art.

▲

hmokiguess 4 hours ago | parent [-]

Sort of relevant https://github.com/Mesh-LLM/mesh-llm

	▲	xrd 4 hours ago \| parent [-]
		Yes, thanks. Interesting project. Not exactly what I need but maybe I can hack on it.