▲ | bharatkalluri 4 days ago | |
Since the past two days I've been working on SpeechShift [1], its a fully local, offline first, speech to text utility that allows you to trigger it with a command, transcribes with whisper and puts pastes it in the window you are currently focused on (like chrome, typora or some other window). Basically SuperWhisper [2] but for linux. (If this is something which interests you & check it out! Feel free to ping me if something does not work as expected.) I've been trying to squeeze out performance out of whisper, but felt (at least for non native speakers) the base model does a good job. In terms of pre processing I do VAD & some normalization. But on my rusty thinkpad the processing time is way too long. I'll try some of the forementioned tips and see if the accuracy & perf can get any better. Post which I'm planning to use a SLM for text cleanup & post processing of the transcription. I'm documenting my learnings over at my notes [3]. [1] https://github.com/BharatKalluri/speechshift [3] https://notes.bharatkalluri.com/speechshift-notes-during-dev... | ||
▲ | abdullahkhalids 4 days ago | parent [-] | |
Do you have any metrics for performance? Have you tried with languages other than English? |