| ▲ | syntaxing 8 hours ago | |||||||
Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong. | ||||||||
| ▲ | daemonologist 8 hours ago | parent | next [-] | |||||||
Parakeet is not really more accurate than Whisper, but it's much faster - faster than realtime even on CPU: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 . You have to use Nemo though, or mess around with third-party conversions. (Also has a big brother Canary: https://huggingface.co/nvidia/canary-1b-v2. There's also the confusingly named/positioned Nemotron speech: https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...) | ||||||||
| ||||||||
| ▲ | phoronixrly 8 hours ago | parent | prev [-] | |||||||
from the other day https://github.com/cjpais/Handy | ||||||||