Remix.run Logo
3D30497420 3 days ago

Maybe inspiration from how Home Assistant can do local speech-to-text and vice versa? https://www.home-assistant.io/voice_control/voice_remote_loc...

Pretty sure you'd need to host this on something more robust than an ESP32 though.

supermatt 3 days ago | parent [-]

Yeah, I was looking at home assistant as well, but it doesnt feel real-time, likely due to it having the transcription stage separate from the inference.