▲ | 3D30497420 3 days ago | |
Maybe inspiration from how Home Assistant can do local speech-to-text and vice versa? https://www.home-assistant.io/voice_control/voice_remote_loc... Pretty sure you'd need to host this on something more robust than an ESP32 though. | ||
▲ | supermatt 3 days ago | parent [-] | |
Yeah, I was looking at home assistant as well, but it doesnt feel real-time, likely due to it having the transcription stage separate from the inference. |