Remix.run Logo
motorest 10 days ago

> As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.

Can you explain your rationale? It seems that the worst case scenario is that your setup might not be the most performant ever, but it will still work and run models just as it always did.

This sounds like a classical and very basic opex vs capex tradeoff analysis, and these are renowned for showing that on financial terms cloud providers are a preferable option only in a very specific corner case: short-term investment to jump-start infrastructure when you do not know your scaling needs. This is not the case for LLMs.

OP seems to have invested around $600. This is around 3 months worth of an equivalent EC2 instance. Knowing this, can you support your rationale with numbers?

tcdent 10 days ago | parent [-]

When considering used hardware you have to take quantization into account; gpt-oss-120b for example is running a very new MXFP4 which will use far more than 80GB to fit into the available fp types on older hardware or Apple silicon.

Open models are trained on modern hardware and will continue to take advantage of cutting edge numeric types, and older hardware will continue to suffer worse performance and larger memory requirements.

motorest 10 days ago | parent [-]

You're using a lot of words to say "I believe yesterday's hardware might not run models as as fast as today's hardware."

That's fine. The point is that yesterday's hardware is quite capable of running yesterday's models, and obviously it will also run tomorrow's models.

So the question is cost. Capex vs opex. The fact is that buying your own hardware is proven to be far more cost-effective than paying cloud providers to rent some cycles.

I brought data to the discussion: for the price tag of OP's home lab, you only afford around 3 months worth of an equivalent EC2 instance. What's your counter argument?

kelnos 10 days ago | parent | next [-]

Not the GP, but my take on this:

You're right about the cost question, but I think the added dimension that people are worried about is the current pace of change.

To abuse the idiom a bit, yesterday's hardware should be able to run tomorrow's models, as you say, but it might not be able to run next month's models (acceptably or at all).

Fast-forward some number of years, as the pace slows. Then-yesterday's hardware might still be able to run next-next year's models acceptably, and someone might find that hardware to be a better, safer, longer-term investment.

I think of this similarly to how the pace of mobile phone development has changed over time. In 2010 it was somewhat reasonable to want to upgrade your smartphone every two years or so: every year the newer flagship models were actually significantly faster than the previous year, and you could tell that the new OS versions would run slower on your not-quite-new-anymore phone, and even some apps might not perform as well. But today in 2025? I expect to have my current phone for 6-7 years (as long as Google keeps releasing updates for it) before upgrading. LLM development over time may follow at least a superficially similar curve.

Regarding the equivalent EC2 instance, I'm not comparing it to the cost of a homelab, I'm comparing it to the cost of an Anthropic Pro or Max subscription. I can't justify the cost of a homelab (the capex, plus the opex of electricity, which is expensive where I live), when in a year that hardware might be showing its age, and in two years might not meet my (future) needs. And if I can't justify spending the homelab cost every two years, I certainly can't justify spending that same amount in 3 months for EC2.

motorest 9 days ago | parent [-]

> Fast-forward some number of years (...)

I repeat: OP's home server costs as much as a few months of a cloud provider's infrastructure.

To put it another way, OP can buy brand new hardware a few times per year and still save money compared with paying a cloud provider for equivalent hardware.

> Regarding the equivalent EC2 instance, I'm not comparing it to the cost of a homelab, I'm comparing it to the cost of an Anthropic Pro or Max subscription.

OP stated quite clearly their goal was to run models locally.

ac29 9 days ago | parent [-]

> OP stated quite clearly their goal was to run models locally.

Fair, but at the point you trust Amazon hosting your "local" LLM, its not a huge reach to just use Amazon Bedrock or something

motorest 9 days ago | parent [-]

> Fair, but at the point you trust Amazon hosting your "local" LLM, its not a huge reach to just use Amazon Bedrock or something

I don't think you even bothered to look at Amazon Bedrock's pricing before doing that suggestion. They charge users per input tokens + output tokens. In Amazon Bedrock, a single chat session involving 100k tokens can cost you $200. That alone is a third of OP's total infrastructure costs.

If you want to discuss options in terms of cost, the very least you should do is look at pricing.

tcdent 9 days ago | parent | prev [-]

I incorporated the quantization aspect because it's not that simple.

Yes, old hardware will be slower, but you will also need a significant amount more of it to even operate.

RAM is the expensive part. You need lots of it. You need even more of it for older hardware which has less efficient float implementations.

https://developer.nvidia.com/blog/floating-point-8-an-introd...

fredmcawesome 9 days ago | parent [-]

But surely this is short term? Once you get older hardware with FP4 support this shouldn't be a concern.