Remix.run Logo
himata4113 10 hours ago

Alright, instead of all this yapping here's the real numbers you can use as a guide:

8xB200[1] costs around 250k DIY and 450k from an enterprise builder so that will be our cost factor, these consume around 7.8kw at 100% load with median load of around 7kw (optimistic) which means that a 240kwh solar installation would be enough to supply it (72kwh buffer for bad weeks / winter) and that will set you back around $240k: this includes battery storage, installation and inverters, diy cost would be lower at around $160k.

This puts the cost of the entire system anywhere from $410k to $690k. This does not take in any property tax or land ownership into account since honestly it varies too much. The solar is simply used to provide a fixed cost basis for powering hardware instead of monthly recurring payments. Financing a 5 year loan would cost anywhere from $8,313 to $15,700.

Now let's do the math for glm-5.2[3], a fully optimized theoretical build can do around 1200tok/s which means that's around 13-14 streams of ~90tok/s on average, pushing batching further and limiting context size to around ~300k with ~150k median) you can achieve up to 37 streams at around ~40tok/s pushing performance envelope to 1400tok/s. This means you are able to generate 2.5B to 2.9B tokens in 4 weeks.

Which means putting the numbers together you can serve 1m tokens at $2.86 to $3.32 per million output tokens all else being equal. Considering that glm-5.2 is approaching opus level intelligence it's pretty safe to say that same applies for frontier labs. Input/cache write/cache reads are very difficult to price, so this assumes you're providing input / cache for free[4]. As a very heavy user I generate around 2M to 5M output tokens a day which would put me at $5.72 to $6.6 of cost per day totalling $200 a month[2].

What I also don't mention is that frontier labs have BY FAR the lowest cost per token out of any provider out there due to the amount of money they also invest into efficiency gains. This was proven by the fact that anthropic saw a huge exodus of openai users put strain on their systems and with efficiency optimizations alone they managed to mitigate a bulk of capacity issues, altho they did run into limits and had to begin spreading out the duck curve, but I have zero doubts they're getting percentage points of improvements month to month.

[1]: H300's are unobtanium unless you're building rack-rooms, H200's are not that cost effective and saturate too fast while having poorer efficiency, only capable of running flash tier models.

[2]: Okay, I didn't expect to arrive at the $200, this is kind of entertaining.

[3]: fp8, z.ai serves fp8 according to openrouter.

[4]: Assuming you want to charge for input / cache, cache reads make up roughly 30% of the cost, output 20% 50% input so to price it out it would be roughly $.3 for 1m input, $.015 for cache reads and $1.5 for output. Judging by https://openrouter.ai/z-ai/glm-5.2#pricing, appears that my math checks out.