new | show | ask | jobs Github

pants2 5 hours ago

Cool idea. Just some back-of-the-envelope math here (not trusting what's on their site):

My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B. Darkbloom's pricing is $0.20 per Mtok output.

That's about $2.24/day or $67/mo revenue if it's fully utilized 24/7.

Now assuming 50W sustained load, that's about 36 kWh/mo, at ~$.25/kWh approx. $9/mo in costs.

Could be good for lunch money every once in a while! Around $700/yr.

▲

torginus 20 minutes ago | parent | next [-]

Also this assumes hardware never fails. I learned about this the hard way back when I started mining crypto on my 5700XT way back when.

I figured since I already used it a lot, and I've never had a GPU fail on me, it would be fine.

The fans on it died in a month of constant use, replacing them was more money than what I made on mining.

▲

mavamaarten 5 hours ago | parent | prev | next [-]

Well. Running your machine to do inference will utilize more than 50W sustained load, I'd say more than double that. Plus electricity is more expensive here (but granted, I do have solar panels). Plus don't forget to factor in that your hardware will age faster.

I'd say it's not worth it. But the idea is cool.

	▲	jorvi 2 hours ago \| parent \| next [-]
		Your hardware will age slower if you have consistent load. Thermal stress from bursty workloads is much more of a wearing problem than electromigration. If you can consistently keep the SoC at a specific temperature, it'll last much longer. This is also why it was very ironic that crypto miner GPUs would get sold at massive discounts. Everyone assumed that they had been ran ragged, but a proper miner would have undervolted the card and ran it at consistent utilization, meaning the card would be in better condition than a secondhand gamer GPU that would have constantly been shifting between 1% to 80% utilization, or rather, 30°C to 75°C
	▲	kennywinker 4 hours ago \| parent \| prev [-]
		Their estimate is based on significantly lower consumption when under load. E.g. 25W for an M4 Pro mac mini. I have no idea if that’s realistic - but the m4s are supposedly pretty efficient (https://www.jeffgeerling.com/blog/2024/m4-mac-minis-efficien...)

▲

kennywinker 5 hours ago | parent | prev | next [-]

Their example big earner models are FLUX.2 Klein 4B and FLUX.2 Klein 9B, which i imagine could generate a lot more tokens/s than a 26B model on your machine.

For Gemma 4 26B their math is:

single_tok/s = (307 GB/s / 4 GB) * 0.60 = 46.0 tok/s

batched_tok/s = 46.0 * 10 * 0.9 = 414.4 tok/s

tok/hr = 414.4 * 3600 = 1,492,020

revenue/hr = (1,492,020 / 1M) * $0.200000 = $0.2984

I have no idea if that is a good estimate of how much an M5 Pro can generate - but that’s what it says on their site.

They do a bit of a sneaky thing with power calculation: they subtract 12Ws of idle power, because they are assuming your machine is idling 24/7, so the only cost is the extra 18W they estimate you’ll use doing inference. Idk about you, but i do turn my machine off when i am not using it.

▲

nnx 4 hours ago | parent | prev | next [-]

> My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B.

This seems high. At which quantization? Using LM Studio or something else?

Note: Darkbloom seems to run everything on Q8 MLX.

▲

todotask2 5 hours ago | parent | prev | next [-]

OpenAI has only about 5% paying customers, how does it generate revenue?

I don’t think this is a sustainable business model. For example, Cubbit tried to build decentralised storage, but I backed out because better alternatives now exist, and hardware continues to improve and become cheaper over time.

Your electricity and ownership are going to get lower return and does not actually requce CO2.

▲

chaoz_ 5 hours ago | parent | prev | next [-]

Genuinely curious, is there any way to estimate amortization of Mac?

I’d imagine 1 year of heavy usage would somehow affect its quality.

	▲	pants2 5 hours ago \| parent [-]
		Yeah, only way to get there is assuming they're not giving prompt caching discounts while my laptop is getting prompt caching benefits, with very many large prompts. So yes I am skeptical of their numbers.

▲

xendo 5 hours ago | parent | prev | next [-]

Any idea what makes for such a diff between your and theirs numbers? Batching? Or could they do a crazy prefix caching across all nodes to reduce the actual processing.

▲

znnajdla 5 hours ago | parent | prev | next [-]

Maybe lunch money for you, but there are people in some parts of the world who live on $200/month. Like Ukraine.

▲

sethherr 4 hours ago | parent [-]

But they probably don’t have M5 MacBook pros idling

	▲	znnajdla 31 minutes ago \| parent \| next [-]
		They can acquire one if it offers real opportunities like this.
	▲	tonyedgecombe 4 hours ago \| parent \| prev [-]
		Or reliable energy or internet.

▲

MrDrMcCoy 5 hours ago | parent | prev [-]

Don't forget to factor in cooling costs.

	▲	pants2 5 hours ago \| parent [-]
		Or saved heating costs in the winter!