Remix.run Logo
pama a day ago

> If I make the plausible but not necessarily correct assumption that OpenAI's API prices reflect the cost of electricity, none of their models are even remotely that cheap

This assumption is very wrong. The primal cost factor in inference is the GPU itself. NVidia’s profit margins are very high; so are OpenAI’s margins for the API usage, even after taking into account the costs of the GPU. You can understand their margins if you read about inference at scale, and the lmsys blog in my parallel answer is a decent eye opener if you thought that companies sell tokens close to the price of electricity.

pama 20 hours ago | parent [-]

An alternative and perhaps easier way to estimate the relative importance of the GPU cost vs the electricity cost is to estimate how many years of constant use of the GPU at full power you need for the cost of industrial-scale electricity to catch up to the cost of the industrial scale GPU pricing. The H200 had 700W max power draw and about 40k USD cost (price varies a lot); typical lowest rental price a year ago was 2USD/h, possibly a bit lower by now. In 1h you could not even spent 1kWh electricity with them in optimal compute conditions, yet, at scale, you can negotiate prices lower than 0.05 USD per kWh of electricity at some parts of the US. Alternatively, assume 0.05 USD per kWh, and use the GB200 NVL72 that draws 120kW at peak. That is a cost of 6USD/hour or $52.6k per year. Even if one were to use the hardware for 10 years straight without problems at peak performance, the cost of electricity is way cheaper than the cost of the hardware itself (you have to ask NVidia for a quote, but expect a multi-million dollar tag and they have no shortage of customers ready to pay.)