I'm not sure I like this method of accounting for it. The critics of LLMs tend to conflate the costs of training LLMs with the cost of generation. But this makes the opposite error: it pretends that training isn't happening as a consequence of consumer demand. There are enormous resources poured into it on an ongoing basis, so it feels like it needs to be amortized on top of the per-token generation costs.

At some point, we might end up in a steady state where the models are as good as they can be and the training arms race is over, but we're not there yet.

▲

Aurornis 4 hours ago | parent | next [-]

That's not really an error, that's a fundamental feature of unit economics.

Fixed costs can't be rolled into the unit economics because the divisor is continually growing. The marginal costs of each incremental token/query don't depend on the training cost.

▲

cortesoft 5 hours ago | parent | prev | next [-]

It would be really hard to properly account for the training, since that won't scale with more generation.

The training is already done when you make a generative query. No matter how many consumers there are, the cost for training is fixed.

▲

nospice 5 hours ago | parent [-]

My point is that it isn't, not really. Usage begets more training, and this will likely continue for many years. So it's not a vanishing fixed cost, but pretty much just an ongoing expenditure associated with LLMs.

▲

bob1029 4 hours ago | parent [-]

No one doing this for money intends to train models that will never be amortized. Some will fail and some are niche, but the big ones must eventually pay for themselves or none of this works.

The economy will destroy inefficient actors in due course. The environmental and economic incentives are not entirely misaligned here.

	▲	quietbritishjim 4 hours ago \| parent [-]
		> No one doing this for money intends to train models that will never be amortized. Taken literally, this is just an agreement with the comment you're replying to. Amortizing means that it is gradually written off over a period. That is completely consistent with the ability to average it over some usage. For example, if a printing company buys a big new printing machine every 5 years (because that's how long they last before they wear out), they would amortize it's cost over the 5 years (actually it's depreciation not amortization because it's a physical asset but the idea is the same). But it's 100% possible to look at the number of documents they print over that period and calculate the price of the print machine per document. And that's still perfectly consistent with the machine paying for itself.

▲

TSiege 5 hours ago | parent | prev | next [-]

The challenge with no longer developing new models is making sure your model is up to date which as of today requires an entire training run. Maybe they can do that less or they’ll come up with a way to update a model after it’s trained. Maybe we’ll move onto something other than LLMs

▲

skybrian 4 hours ago | parent | prev | next [-]

The training cost is a sunk cost for the current LLM, and unknown for the next-generation LLM. Seems like it would be useful information but doesn't go here?

▲

robocat 4 hours ago | parent | prev [-]

The AI training data sets are also expensive... The cost is especially hard to estimate for data sets that are internal to businesses like Google. Especially if the model needs to be refreshed to deal with recent data.

I presume historical internal datasets remain high value, since they might be cleaner (no slop) or maybe unavailable (copyright takedowns) and companies are getting better at hiding their data from spidering.