With batched parallel requests this scales down further. Even a MacBook M3 on battery power can do inference quickly and efficiently. Large scale training is the power hog.