I've been using FFmpeg and Whisper to record and transcribe live police scanner audio for my city, and update it in real-time to a live website. It works great, with the expected transcription errors and hallucinations.

▲

Xunjin 21 hours ago | parent | next [-]

Is this website open? Would love to see your work :P

▲

webinar 21 hours ago | parent [-]

somerville.votolab.com

	▲	jaster 20 hours ago \| parent \| next [-]
		All the "Thanks for watching!" gave me a good chuckle. Remind me of one of my own experiences with one of the Whisper model, where some random noise in the middle of the conversation was translated into "Don't forget to like and subscribe". Really illustrate where the training data is coming from.
	▲	mkayokay 20 hours ago \| parent \| prev [-]
		Looks like this is a nice case were the LLM thinks that silence is "thanks for watching" which was discussed on here a few days ago.

▲

waltbosz 20 hours ago | parent | prev [-]

I wanted to do this for my local county council meetings. I think in this context speaker recognition would be important.