▲ | vunderba 4 days ago | |
Nice job. I made a similar python script available as a Github gist [1] a while back that given an audio file does the following: - Converts to 16kHz WAV - Transcribes using native ggerganov whisper - Calls out to a local LLM to clean the text - Prints out the final cleaned up transcription I found that accuracy/success increased significantly when I added the LLM post-processor even with modestly sized 12-14b models. I've been using it with great success to convert very old dictated memos from over a decade ago despite a lot of background noise (wind, traffic, etc). [1] https://gist.github.com/scpedicini/455409fe7656d3cca8959c123... |