Remix.run Logo
rapsey 2 days ago

> Inference is significantly less power hungry, so it can run base load 24/7.

All major AI providers need to throttle usage because their GPU clusters are at capacity. There is absolutely no way inference is less power hungry when you have many thousands of users hammering your servers at all times.

blitzar 2 days ago | parent [-]

Furthermore NVIDIAs 80% profit margin makes idling your biggest capital expense a huge ROI problem. Google and Apple should have a big advantage in this regard.

If the balance between capital outlay and running costs was more balanced - then optimising the running cost becomes a big line item on the accounts.