▲ | Centigonal 9 hours ago | |
I can run an LLM on my RTX3090 that is at least as useful to me in my daily life as an AAA game that would otherwise justify the cost of the hardware. This is today, which I suspect is in the upper part of the Kuznets curve for AI inference tech. I don't see a future where LLMs are too expensive to run (at least for some subset of valuable use cases) as likely. | ||
▲ | TeMPOraL 8 hours ago | parent [-] | |
I don't even get where this argument comes from. Pretraining is expensive, yes, but both LoRAs in diffusion models and finetunes of transformers show us that this is not the be-all, end-all; there's plenty of work being done on extensively tuning base models for cheap. But inference? Inference is dirt cheap and keeps getting cheaper. You can run models lagging 6-12 years on consumer hardware, and by this I don't mean absolutely top-shelf specs, but more of "oh cool, turns out the {upper-range gaming GPU/Apple Silicon machine} I bought a year ago is actually great at running local {image generation/LLM inference}!" level. This is not to say you'll be able to run o3 or Opus 4 on a laptop next year - larger and more powerful models obviously require more hardware resources. But this should anchor expectations a bit. We're measuring inference costs in multiples of gaming GPUs, so it's not an impending ecological disaster as some would like the world to believe - especially after accounting for data centers being significantly more efficient at this, with specialized hardware, near-100% utilization, countless of optimization hacks (including some underhanded ones). |