▲ | TeMPOraL 18 hours ago | |
I don't even get where this argument comes from. Pretraining is expensive, yes, but both LoRAs in diffusion models and finetunes of transformers show us that this is not the be-all, end-all; there's plenty of work being done on extensively tuning base models for cheap. But inference? Inference is dirt cheap and keeps getting cheaper. You can run models lagging 6-12 years on consumer hardware, and by this I don't mean absolutely top-shelf specs, but more of "oh cool, turns out the {upper-range gaming GPU/Apple Silicon machine} I bought a year ago is actually great at running local {image generation/LLM inference}!" level. This is not to say you'll be able to run o3 or Opus 4 on a laptop next year - larger and more powerful models obviously require more hardware resources. But this should anchor expectations a bit. We're measuring inference costs in multiples of gaming GPUs, so it's not an impending ecological disaster as some would like the world to believe - especially after accounting for data centers being significantly more efficient at this, with specialized hardware, near-100% utilization, countless of optimization hacks (including some underhanded ones). |