Remix.run Logo
bastawhiz 6 hours ago

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost.

But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this!

It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac.

Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

datadrivenangel 6 hours ago | parent | next [-]

Rounding everything down in the most optimistic setting got me to $0.40 per million tokens, and openrouter has the same model at $.38/mtok.

nativeit 3 hours ago | parent | next [-]

But once all that is done you still own a Mac in one case, and you don’t in the other, correct?

teekert an hour ago | parent | next [-]

Plus your privacy.

odo1242 2 hours ago | parent | prev [-]

Yea this; it’s the same reason why mortgaging is cheaper than renting

ericpauley 2 hours ago | parent | next [-]

This is far from a universal truth: https://www.nytimes.com/interactive/2024/upshot/buy-rent-cal...

Real estate is only a clearly good investment if you ignore opportunity cost.

seanmcdirmid 2 hours ago | parent | next [-]

You also need to pay close attention to rent vs purchase ratios. A lot of cities are cheap to rent but expensive to buy (eg beijing 10 years ago).

mantas 2 hours ago | parent [-]

Key word being „ago“.

deaux an hour ago | parent [-]

Such cities still exist and have been in such a state for decades. They can change but that's meaningless as they can also change the other way around.

sgt 2 hours ago | parent | prev | next [-]

Articles like that still miss a bit of the nuance. Imagine having your house paid for, and you grow old and you have no rent to pay. Yes, you could have invested but likely you would have spent some of that money on something else, or your investments might have not worked out so well, or any other reason. Human reasons, to be specific. Owning property is like a lock.

orangecat 43 minutes ago | parent | next [-]

Imagine having your house paid for, and you grow old and you have no rent to pay.

My home is "paid for". Except for the HOA and property taxes that are not that far off from what I was previously paying in rent, the ongoing maintenance costs with random large spikes, and the opportunity cost of having a large chunk of money in the house and not in the market. It was still probably the right decision, but it's not at all a free lunch.

ffsm8 a minute ago | parent | next [-]

And it's gonna be interesting wherever this narrative will shift over the next 5 yrs

I keep hearing that properties are in the biggest bubble yet in the USA - with the affordable housing shortage being a red herring, because real estate managers and boomers are unwilling/unable to reduce their prices - despite not getting renters/buyers because it would kick off a death spiral as their interests would consequently go up (because of lower security). Along with the ai layoffs etc

I'm not American so I only hear the occasional interview so don't have any idea if it's really as pressing as these industry professionals keep saying but I'm definitely at the edge of my seat watching...

sgt 6 minutes ago | parent | prev [-]

Surely though, the HOA and all that would likely be baked into a renter's price.

And you didn't need to go live in a HOA. I don't, and it's much cheaper.

ericpauley an hour ago | parent | prev [-]

You and Matt Levine would get along: https://news.bloomberglaw.com/banking-law/matt-levines-money...

hadlock an hour ago | parent | prev [-]

It never fails, there's always someone who trots this thing out. We had bought our house, and then had to move and decided to rent. I was APPALLED that they wanted me to fill out an APPLICATION form, where they would decide my worth, and let me know if we would be allowed to live there. When buying a house, my cash was as good as anyone elses'. And then the management company would come inside my house to inspect that I wasn't running a meth lab or something. Thankfully that only lasted two years. I will never rent again. Majority owner-occupied neighborhoods have different characteristics as well.

loeg 44 minutes ago | parent [-]

> I was APPALLED that they wanted me to fill out an APPLICATION form, where they would decide my worth, and let me know if we would be allowed to live there. When buying a house, my cash was as good as anyone elses'.

House sellers receive offers from buyers, sometimes including letters, and can choose to sell to any of them (or none of them), whether or not those offers are higher than the listed price. It's not so different.

> And then the management company would come inside my house to inspect that I wasn't running a meth lab or something.

Yeah that part is different. I also prefer owning.

BoorishBears an hour ago | parent | prev [-]

Except one day the hype will catch up to reality that was always true, people will realize their $20,000 Mac is has less utility as a "way to learn AI" than some kids 3090 fortnite machine, and it'll be back to below MSRP.

novok 16 minutes ago | parent | prev | next [-]

Also many have power even cheaper or even free unused surplus power with solar.

I don't do local inference other than hobby & learning reasons because electricity is so expensive where I am at.

650REDHAIR 5 hours ago | parent | prev | next [-]

I’ll keep my data local over a $.02/mtok difference.

quietsegfault 5 hours ago | parent [-]

It’s more than just data locality. OpenRouter is faster, no? I have an M4 pro, and anything but the smallest dumbest models are unusably slow for interactive use. I personally haven’t yet found a good use case for offline/non-interactive LLM work locally.

PAndreew 4 minutes ago | parent | next [-]

I’m running a local Whisper + Gemma 4 pipeline with a cheap USB mic to extract health related data and potential todos from ambient speech. It doesn’t have to be fast doesn’t have to be 100% correct because if it captures at least a few bits of interesting information that would otherwise go unnoticed it’s still a win.

novok 21 minutes ago | parent | prev | next [-]

I played with classifying and summarizing my entire email history (per email) with small models, but that only took about 12h of GPU time at most. Using a coding agent cli wrapper in that case is far slower because of all the spin up cost and the system prompt they inject even if you want to turn it all off.

If I used an actual direct API it probably would've been much faster, but I'm doing it for hobby / fun reasons. You also get to fiddle with a lot more params.

datadrivenangel 4 hours ago | parent | prev | next [-]

Yeah. The speed is the biggest issue. The intelligence of open models is good enough for serious work (though still worse than the frontier models), but the cloud models are often 3-7 times faster, and you can get more parallelization and so get speeds on the order of hundreds of tokens per second, which makes things fast!

freeopinion 2 hours ago | parent [-]

Even extremely slow LLMs can generate Part B faster than I can audit Part A. So the LLM can generate Part A while I look over my email. Then it can worry over Part B while I look over Part A.

It can worry over Part C while I have my 10:30 group meet. And it can worry over Part D while I do whatever other silly, time-wasting thing all humans do in almost all organizations. Then I still haven't reviewed Part B, yet, so the extremely slow AI is waiting on me.

Maybe someday I'll be good enough to need faster AI so I can rewrite something like Bun in a few days. Right now, slow and local fits my use case very well.

quietsegfault an hour ago | parent [-]

I don’t think it matters if you’re “good enough” or not. Much of AI development is iterative. If you context switch between A from project 1 to B from project 2 back to check A, then maybe C while B finishes up, you will lose the flow state that AI assistance can enable with speed for those who are not fluent coders.

Sure, I can wait hours for my local model to finish, or I can spend basically as much and get the answer right away

There’s a lot of exciting stuff with local LLMs despite the speed, but for me I don’t have the discipline and working memory to jump from project to project.

threatofrain 2 hours ago | parent | prev [-]

And continuing the argument of "more than just...", if you stopped inferencing on your Mac you still have a generally nice computer. The difference between rent vs buy.

formerly_proven 4 hours ago | parent | prev [-]

What is it with AI SaaS naming themselves "openxyz" when there is 0% open about them?

em500 3 hours ago | parent | next [-]

They learnt from ooenai that naming yourself open-xyz doesn't actually require opening anything.

debugnik 2 hours ago | parent | prev [-]

It's the next co-opted buzzword after "democratize".

faitswulff 5 hours ago | parent | prev | next [-]

The article makes no sense. I can't use OpenRouter as a general purpose computing device. Why are we comparing a whole computer to a single purpose SaaS?

mpyne 4 hours ago | parent | next [-]

They're responding to the people doing things like buying the most expensive Mac they can find specifically to do local inference for their AI agents.

Some do it to have control over their ability to use AI. Some do it because they think it will be cheaper to not have to pay a SaaS to generate tokens for them.

But for those interested in the latter case, it seems like it's not actually cheaper after all, at least at current prices. But then I don't expect prices to drastically jump because of how much competition there is in model development.

datadrivenangel 3 hours ago | parent | next [-]

It's worth paying a premium for the privacy (assuming that llama.cpp and ollama aren't sending my sessions back to the cloud regardless...), and for the concerns about not getting a surprise bill.

an hour ago | parent [-]
[deleted]
dcrazy 2 hours ago | parent | prev [-]

You also have control over your costs. It is reasonable to assume that tokens will cost significantly more in the near to medium future as the market consolidates and subsidies decline.

sheepscreek 3 hours ago | parent | prev | next [-]

No, that’s not the point. I think this is to help people who are thinking about getting a beefier Mac so they can run their LLMs on it too. Some in particular want a dedicated Mac Mini or Studio for this purpose. The breakdown, even if slightly flawed, offers a good insight into the economics of it.

For most people, they might be better off with OpenRouter models and providers supporting Zero Data Retention. On the cloud, that’s as good as it gets for privacy - your data is never retained beyond the life of the request.

tuwtuwtuwtuw 5 hours ago | parent | prev [-]

I think it's because there are a lot of people writing articles about the benefits of running local models. I think it's fair to say that there are daily threads on HN singing the praises or local inference. I also see people buying new hardware where the main trigger is ability to run local models.

FuckButtons 4 hours ago | parent [-]

But the people who want to do local inference are putting some amount of value on privacy that’s not captured by the raw monetary value so just comparing the price is somewhat beside the point, it’s also true that, if you have eg a Mac and you use that as your main computing device then you would have spent money on it anyway, so you can’t even really compare its value to spend on something that’s not general purpose.

apf6 2 hours ago | parent | next [-]

That's a lot of assumptions. I think there are also people buying new hardware specifically for this purpose, and their motivation to do it is thinking it will be cheaper in the long run. Privacy is not necessarily the motivation.

datadrivenangel 4 hours ago | parent | prev | next [-]

My overall opinion is that the smart thing is not to upgrade to the maximum memory for AI purposes. It's worth quantifying how much extra we pay for privacy.

tuwtuwtuwtuw 4 hours ago | parent | prev [-]

I replied to a comment asking why the article exists.

As for privacy, I'm sure there are many people that are not so interested in that aspect.

statestreet123 4 hours ago | parent | prev | next [-]

Rounded up, yes, and oddly inefficient for someone obsessed with inefficiency. One could buy a brand new 64gb M5 macbook for well over 4k. Another could buy a scratched up but functioning M1 Max 64gb off of ebay for a little over 1k—and somehow get the same 10-20 t/s with 31b that the author does with an M5. Or better yet, have a frontier model do the planning and judging, and have a local MOE model execute at 50 t/s. All of this achievable by a former English major with too much free time.

novok 17 minutes ago | parent [-]

I have an M1 Pro, and a M4 & M5 max to play with at work and the speed difference is very significant between all 3 machines, the M1 Pro is far slower, and the M5 is significantly faster than the M4. And a windows 3090 beats all of them but eats twice the amount of power per token. This is all running the same 24GB memory friendly model with LM studio.

dist-epoch 6 hours ago | parent | prev | next [-]

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

bastawhiz 5 hours ago | parent | next [-]

That's the point: why would you buy a device that's specifically not optimized to be used for 24/7 inference? It's expensive hardware that's not designed to be used in that situation! The power use for inference isn't especially good and you're not getting even a fraction of the benefit from the hardware that you're paying for.

apf6 2 hours ago | parent | next [-]

Good question but people are doing it anyway. It's a fact that right now tons of people are buying Mac Minis specifically for this use case, to treat them as their personal data center for agents. The concept of "power use for inference" is foreign. Those people are the ones that motivated this blog post I think.

dist-epoch 2 hours ago | parent | prev [-]

> why would you buy a device that's specifically not optimized to be used for 24/7 inference

because it costs $1k-$2k instead of $10k-30k+ for optimized devices

groundzeros2015 5 hours ago | parent | prev [-]

The hardware has multiple uses for the same cost. The pay-per-use server does not.

llm_nerd 4 hours ago | parent | prev | next [-]

Your post makes sense if you bought the hardware for other reasons, and maybe run models occasionally as a novelty.

That isn't the case for many, though, and there is a whole social media space where people are hyping up the latest homebrew options for running models, believing it frees them from the yoke of big AI.

Millions of people are buying big $ maxed-out hardware like the Mac Studios or DGX specifically to run LLMs. Someone rationally running the numbers is a good thing.

atq2119 2 hours ago | parent [-]

Let's not get ahead of ourselves. Millions, really? I can believe there are a lot of enthusiasts doing this, but "millions" needs a citation.

cyanydeez 5 hours ago | parent | prev | next [-]

nothing about the current data center craze looks efficient.

bastawhiz 5 hours ago | parent | next [-]

Whether you think building data centers or not is a good idea it's inarguable that the per-token efficiency (power, hardware, etc) is FAR higher in a data center. That's literally what it's designed for.

cyanydeez 2 hours ago | parent [-]

im talking per value. look at the efgiency of chinese open source models; then look at SOTA sucking gigawatts, then the proposals.

America is basically proposing AI using the equivalent bloatware of Windows 11.

trollbridge 4 hours ago | parent | prev [-]

Probably because lots of data centres are being built (or half-built) which are sitting idle.

mpyne 4 hours ago | parent [-]

If there are datacenters sitting idle right now then you could probably make a lot of money selling that capacity to Anthropic at this point...

espadrine 2 hours ago | parent | prev [-]

[dead]