Remix.run Logo
vunderba 4 days ago

Nice job. I made a similar python script available as a Github gist [1] a while back that given an audio file does the following:

- Converts to 16kHz WAV

- Transcribes using native ggerganov whisper

- Calls out to a local LLM to clean the text

- Prints out the final cleaned up transcription

I found that accuracy/success increased significantly when I added the LLM post-processor even with modestly sized 12-14b models.

I've been using it with great success to convert very old dictated memos from over a decade ago despite a lot of background noise (wind, traffic, etc).

[1] https://gist.github.com/scpedicini/455409fe7656d3cca8959c123...