Also consider using Cerebras' inference APIs. They released a voice demo a while back and the latency of their model inference is insane.