Remix.run Logo
scosman 8 hours ago

I’m a big fan of whisperKit for this, and they just added TTS. Great because they support features like speaker diarization (“who spoke when”) and custom dictionaries.

Here’s a load test where they run 4 models in realtime on same device:

- Qwen3-TTS - text to speech

- Parakeet v2 - Nvidia speech to text model

- Canary v2 - multilingual / translation STT

- Sortformer - speaker diarization (“who spoke when”)

https://x.com/atiorh/status/2027135463371530695