▲ | superluserdo a day ago | |
I basically implemented exactly this on top of whisper since I couldn't find any implementation that allowed for live transcription. https://tomwh.uk/git/whisper-chunk.git/ I need to get around to cleaning it up but you can essentially alter the number of simultaneous overlapping whisper processes, the chunk length, and the chunk overlap fraction. I found that the `tiny.en` model is good enough with multiple simultaneous listeners to be able to have highly accurate live English transcription with 2-3s latency on a mid-range modern consumer CPU. |