If you ever do it, please make a guide! I've been toying with the same notion myself

suprjami 5 hours ago | parent | next [-]

If you want to do it cheap, get a desktop motherboard with two PCIe slots and two GPUs.

Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16 tok/sec. The limitation is VRAM for large context. 1000 lines of code is ~20k tokens. 32k tokens is is ~10G VRAM.

Expensive tier is dual 3090 or 4090 or 5090. You'd be able to run 32B Q8 with large context, or a 70B Q6.

For software, llama.cpp and llama-swap. GGUF models from HuggingFace. It just works.

If you need more than that, you're into enterprise hardware with 4+ PCIe slots which costs as much as a car and the power consumption of a small country. You're better to just pay for Claude Code.

▲

le-mark 3 hours ago | parent [-]

I was going to post snark such as “you could use the same hardware to also lose money mining crypto” then realized there are a lot of crypto miners out their that could probably make more money running tokens then they do on crypto. Does such a market place exist?

▲

hackstack 3 hours ago | parent [-]

This is essentially vast.ai, no?

	▲	2 hours ago \| parent [-]
		[deleted]

▲

satvikpendem 4 hours ago | parent | prev [-]

Jeff Geerling has (not quite but sort of) guides: https://news.ycombinator.com/item?id=46338016