Remix.run Logo
HNisCIS 5 hours ago

LLMs don't use much energy at all to run, they use it all at the beginning for training, which is happening constantly right now.

TLDR this is, intentionally or not, an industry puff piece that completely misunderstands the problem.

Also, even if everyone is effectively running a a dishwasher cycle every day, this is still a problem that we can't just ignore, that's still a massive increase in ecological impact.

simonw 5 hours ago | parent | next [-]

The training cost for a model is constant. The more individual use that model gets the lower the training-cost-per-inference-query gets, since that one-time training cost is shared across every inference prompt.

It is true that there are always more training runs going, and I don't think we'll ever find out how much energy was spent on experimental or failed training runs.

dietr1ch 4 hours ago | parent [-]

> The training cost for a model is constant

Constant until the next release? The battle for the benchmark-winning model is driving cadence up, and this competition probably puts a higher cost on training and evaluation too.

simonw 4 hours ago | parent [-]

Sure. By "constant" there I meant it doesn't change depending on the number of people who use the model.

dietr1ch an hour ago | parent [-]

I got that part, it's just that it overlooks the power consumption of the AI race.

I wish we reach the everything is equally bad phase so we can start enjoying the more constant cost of the entire craze to build your own model with more data than the rest.

kingstnap 5 hours ago | parent | prev | next [-]

You underestimate the amount of inference and very much overestimate what training is.

Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently).

Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales.

jeffbee 5 hours ago | parent | prev | next [-]

Training is pretty much irrelevant in the scheme of global energy use. The global airline industry uses the energy needed to train a frontier model, every three minutes, and unlike AI training the energy for air travel is 100% straight-into-your-lungs fossil carbon.

pluralmonad 4 hours ago | parent [-]

Not to mention doesn't aviation fuel still make heavy (heh) use of lead?

TSiege 4 hours ago | parent [-]

I think thats only true for propeller planes, which use leaded gasoline. Jet fuel is just kerosene

tialaramex 3 hours ago | parent [-]

Pistons, rather than all propellers. Basically imagine a really old car engine, because simplicity is crucial for reliability and ease of maintenance so all those "fancy" features your car had by the 1990s aren't available, however instead of turning wheels the piston engine turns a propeller. Like really old car engines these piston engines tend to be designed for leaded fuel. Because this is relatively cheap to do, all the cheapest planes aimed at GA (General Aviation, ie you just like flying a plane, not for pay) are like this.

Propellers are a very common means to make aeroplanes work though, instead of a piston engine, which is cheap to make but relatively unreliable and expensive to run, you can use turbine engines, which run on JetA aka kerosene, and the rotary motion of the turbine drives the propeller making a turboprop. In the US you won't see that many turboprop engines for passenger service, but in the rest of the world that's a very common choice for medium distance aeroplane routes, while the turbofan planes common everywhere in the US would in most places be focused on longer distances between bigger airfields because they deliver peak efficiency when they spend longer up in the sky.

JetA, whether for a turbofan or turboprop does not have lead in it, so to a first approximation no actual $$$ commercial flights spew lead. They're bad for the climate, but they don't spew lead into the atmosphere.

linolevan 5 hours ago | parent | prev [-]

I'm not convinced that LLM training is at such a high energy use that it really matters in the big picture. You can train a (terrible) LLM on a laptop[1], and frankly that's less energy efficient than just training it on a rented cloud GPU.

Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).

[1]: https://github.com/lino-levan/wubus-1 [2]: https://huggingface.co/lino-levan/qwen3-1.7b-smoltalk