| ▲ | CamouflagedKiwi 3 hours ago | |
It's not that you want it to be faster, but you want the latency to be predictable and reliable, which is much more the case for local inference than sending it away over a network (and especially to the current set of frontier model providers who don't exactly have standout reliability numbers). | ||