| ▲ | zozbot234 8 hours ago | |||||||
> LLMs are far more efficient on hardware that simultaneously serves many requests at once. The LLM inference itself may be more efficient (though this may be impacted by different throughput vs. latency tradeoffs; local inference makes it easier to run with higher latency) but making the hardware is not. The cost for datacenter-class hardware is orders of magnitude higher, and repurposing existing hardware is a real gain in efficiency. | ||||||||
| ▲ | Tepix 8 hours ago | parent [-] | |||||||
Seems doubtful. The utilisation will be super high for data center silicon whereas your PC or phone at home is mostly idle. | ||||||||
| ||||||||