Ok I really really have to figure out how to have a local setup of the open-source LLMs. I know i know - the "fixed costs" are high. But I have a strong feeling being able to setup local LLMs (and the rig for it) is the next build-your-own-PC phase. All I want is a coding agent and the grunt power to run it locally. Everything else Il build (generate) with it.

I see so many folks claiming crazy hardware rigs and performance numbers so no idea where to begin. Any good starting points on this?

(Ok budget is TBD - but seeing a you get X for $Y would atleast help make an informed decision).

▲

colonCapitalDee 5 days ago | parent | next [-]

You should consider self-hosting in the cloud. When you start coding run a script that spins up a new VM and loads the LLM of your choice, then run another script to spin it back down when you're done. For intermittent use this works great and is much cheaper than buying your own hardware, plus it's future proof. It does admittedly lack the cool factor of truly running locally though.

▲

menaerus 3 days ago | parent | next [-]

Too expensive from what I have seen - price for reasonably large GPU rigs that can host medium to large models is anywhere between ~5$/hr to ~9$/hr. That's ~40$-72$ for 8-hour working day or ~800$-1500$ for ~20 working days in a month. That's ~1000$ a month in average. This doesn't math for me.

	▲	colonCapitalDee 2 days ago \| parent [-]
		I was assuming hobbyist use, so on the order of 2-4 hours every couple days

▲

flashgordon 5 days ago | parent | prev [-]

Yeah this is the setup I am thinking for now as it is all the "Freedom" with only hardware dependence. Wierdly enough I noticed Qwen3 (coder) was also almost same price as opus 4 which was wierd.

	▲	cpursley 5 days ago \| parent [-]
		Qwen pricing on fireworks.ai is pretty good

▲

richwater 5 days ago | parent | prev | next [-]

If you're okay with lower quality output, a $10k Mac Studio will get you there. But you _will_ have to accept lower quality outputs compared to todays' frontier models.

▲

OtherShrezzing 5 days ago | parent | next [-]

>But you _will_ have to accept lower quality outputs compared to todays' frontier models.

I'm curious how much lower quality we're talking about here. Most of the work I ever get an LLM to do is glue-code, or trivial features. I'd expect some fine-tuned Codestral type model with well focused tasks could achieve good performance locally. I don't really need worlds-leading-expert quality models to code up a hamburger menu in a React app & set the background-color to #A1D1C1.

▲

gnator 5 days ago | parent | prev | next [-]

Has anyone tried running with a tenstorrent card? Wanted to see how they fare

▲

flashgordon 5 days ago | parent | prev [-]

Yeah I was actually thinking about a proper rig - My gut feel is a rig wouldnt be as expensve as a mac and would actually have a higher ROI (at the expense of portability)?

My other worry about the mac is how unupgradable it is. Again not sure how fruitful it is - in my (probably fantasy land) view if I can setup a rig and then keep updating components as needed - it might last me a good 5 years say for 20k over that period? Or is that too hopeful?

So for 20K over 5 years or 4k per year - it comes to about 400 a month (ish). The equivalent of 2 MAX pro subscriptions. Let us be honest - right now with these limits running more than 1 in parallel is going to be forbidden.

if I can run 2 claude level models (assuming the DS and Qwens are there) then I am already breaking even but without having to participating in training with all my codebases (and I assume I can actually unlock something new in the process of being free).

▲

lossolo 5 days ago | parent [-]

Buy 4–8 used 3090s (providing 96–192 GB of VRAM), depending on the model and weight quantization you want to run. Used 3090 costs around $800. Add more RAM to offload layers if needed. This setup currently offers the best value for performance.

https://www.reddit.com/r/LocalLLaMA/comments/1iqpzpk/8x_rtx_...

You can look for more rig examples on that subreddit.

▲

esskay 5 days ago | parent | next [-]

I do wonder what the ongoing cost there would be. The ~$9k hardware cost is an easy thing to quantify, but going with a bank of very hot, power hungry GPU's is going to rack up a hefty monthly bill in many parts of the world.

I imagine theres also going to be some problems hooking something like that up to a normal wall socket in North America? (I like the reddit poster am in Europe so on 220v)

▲

icelancer 5 days ago | parent | next [-]

It's not too bad - I run 6x RTX 3090s on a 2nd-gen Threadripper with PCIe bifurcation cards. The energy usage is only really bad if you're training models constantly, but inference is light enough.

I use 208V power but 120V can indeed be a challenge. The USA has split phase wiring; every house has 220-240V if they need it. Bit of a misunderstanding of how our power works - we have 220-240V on tap, but typical outlets are 110-120V.

▲

flashgordon 5 days ago | parent [-]

Yeah at this point the goal is to see how to maximize for inference. For training it is impossible from the get go to compete with the frontier labs anyway. Im trying to calculate (even amortized over 2 years) the daily cost of running the equivalent rig that can get close to a single claude agent performance. (without needing a 6-digit gpu).

	▲	icelancer 4 days ago \| parent [-]
		Really the only reason to have a local setup is for 24/7 on-demand high-volume inference that can't tolerate enormous cold starts.

▲

flashgordon 5 days ago | parent | prev [-]

Yeah this was what I was doubting too. Like the hardware is one off but how much do you have to modernize your house (lines, cooling, eletrical-fire-safety etc)?

▲

flashgordon 5 days ago | parent | prev [-]

Also I wonder if like the old days you could "try" these out somewhere first. Imaging plonking down 5-10k and nothing works (which is fine if you can get a refund ha).

▲

paxys 5 days ago | parent | prev [-]

You can build a decent rig for yourself with:

- 2x 4070 Ti (32 GB total VRAM) - $2200

- 64 GB RAM - $200-250

- Core i9/Ryzen 9 CPU - $450

- 2 TB SSD - $150

- Motherboard, cooler, case, PSU - $500-600

Total - ~$3500-3700, say $4000 with extras.

▲

gaws 2 days ago | parent | next [-]

> - 2x 4070 Ti (32 GB total VRAM) - $2200

4070 card for $1,100? In this market?

▲

flashgordon 5 days ago | parent | prev [-]

wow - do you mind sharing any links to a specific setup? Also whats the biggest model anybody has run on this?

▲

paxys 5 days ago | parent | next [-]

You can run a decent model on it, say highly quantized Qwen or Deepseek R1 getting 5-10 tokens/sec output, but it will be nothing in comparison to a commercial offering like Claude, o3 or Gemini. For that you need a datacenter-class GPU going for $50K-100K a pop.

	▲	mtkd 4 days ago \| parent [-]
		But a small collective running that box, especially spanning timezones, could potentially be a viable alternative or will be soon -- with obv privacy gains too

▲

lossolo 5 days ago | parent | prev | next [-]

Unfortunately, you will not be able to run any model on this that is comparable to the Claude models.

▲

icelancer 5 days ago | parent | prev [-]

Every model you run on that setup will be at best half as good as Sonnet 4.