| ▲ | scosman 8 hours ago | |
I’m a big fan of whisperKit for this, and they just added TTS. Great because they support features like speaker diarization (“who spoke when”) and custom dictionaries. Here’s a load test where they run 4 models in realtime on same device: - Qwen3-TTS - text to speech - Parakeet v2 - Nvidia speech to text model - Canary v2 - multilingual / translation STT - Sortformer - speaker diarization (“who spoke when”) | ||