I'm not sure this information is grounded, but I remember to have read somewhere the inference is indeed profitable. My personal experience is similar. Running 2x3090s draw 500-600W and you can locally run amazing models with a similar setup.

▲

sandworm101 2 hours ago | parent [-]

Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.

	▲	paulddraper 9 minutes ago \| parent [-]
		You are in violent agreement. > inference is indeed profitable