Yeah something like that. I am thinking of using audio to text converter using whisper apis, index the audio text in a vector DB and then perform string matching.