> What is their next step to ensure local models never overtake them?

As someone who experiments with local models a lot, I don’t see this as a threat. Running LLMs on big server hardware will always be faster and higher quality than what we can fit on our laptops.

Even in the future when there are open weight models that I can run on my laptop that match today’s Opus, I would still be using a hosted variant for most work because it will be faster, higher quality, and not make my laptop or GPU turn into a furnace every time I run a query.

▲

zozbot234 6 hours ago | parent [-]

If your laptop overheats when you push your GPU, you can buy purpose-built "gaming" laptops that are at least nominally intended to sustain those workloads with much better cooling. Of course, running your inference on a homelab platform deployed for that purpose, without the thermal constraints of a laptop, is also possible.

	▲	Aurornis 5 hours ago \| parent [-]
		I didn't say it overheats. It gets hot and the fans blow, neither of which are enjoyable. MacBook Pro laptops are preferred over "gaming" laptops for LLM use because they have large unified memory with high bandwidth. No gaming laptop can give you as much high-bandwidth LLM memory as a MacBook Pro or an AMD Strix Halo integrated system. The discrete gaming GPUs are optimized for gaming with relatively smaller VRAM.