▲ | jameshart 6 days ago | |
> Servers will always have way more compute power than edge nodes This doesn't seem right to me. You take all the memory and CPU cycles of all the clients connected to a typical online service, compared to the memory and CPU in the datacenter serving it? The vast majority of compute involved in delivering that experience is on the client. And there's probably vast amounts of untapped compute available on that client - most websites only peg the client CPU by accident because they triggered an infinite loop in an ad bidding war; imagine what they could do if they actually used that compute power on purpose. But even doing fairly trivial stuff, a typical browser tab is using hundreds of megs of memory and an appreciable percentage of the CPU of the machine it's loaded on, for the duration of the time it's being interacted with. Meanwhile, serving that content out to the browser took milliseconds, and was done at the same time as the server was handling thousands of other requests. Edge compute scales with the amount of users who are using your service: each of them brings along their own hardware. Server compute has to scale at your expense. Now, LLMs bring their special needs - large models that need to be loaded into vast fast memory... there are reasons to bring the compute to the model. But it's definitely not trivially the case that there's more compute in servers than clients. | ||
▲ | arghwhat 6 days ago | parent [-] | |
The sum of all edge nodes exceed the power in the datacenter, but the peak power provided to you from the datacenter significantly exceed your edge node capabilities. A single datacenter machine with state of the art GPUs serving LLM inference can be drawing in the tens of kilowatts, and you borrow a sizable portion for a moment when you run a prompt on the heavier models. A phone that has to count individual watts, or a laptop that peaks on dual digit sustained draw, isn't remotely comparable, and the gap isn't one or two hardware features. |