Kind of, it's a family of audio transcription models.
https://huggingface.co/search/full-text?q=whisper