| ▲ | renewiltord 10 hours ago | |
What’s the current state of the art in low power wake word and speech to text? Has anyone written a blog post on this? I was able to run a speech to text on my old Pixel 4 but it’s a bit flaky (the background process loses the audio device occasionally). I just want to take some wake word and then send everything to remote LLM and then get back text that I do TTS on. | ||
| ▲ | geerlingguy 10 hours ago | parent | next [-] | |
Maybe not SOTA but the HA Voice Preview Edition [1] in tandem with a Pi 5 or some similar low-power host for the Piper / Whisper pipeline is pretty good. I don't use it but was able to get an Alexa/Google Home-like experience going with minimal effort. I was only using it for local Home Assistant tasks, didn't try anything further like retrieving sports scores, managing TODO lists, or anything like that. | ||
| ▲ | folmar 9 hours ago | parent | prev | next [-] | |
Wake word is not expensive, you can do it on esp32 https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3... (and then send audio to something more beefy as TTS will be marginal at best). | ||
| ▲ | monocasa 10 hours ago | parent | prev [-] | |
Wake word can be tiny. Like 10k weights and can run on an esp32 or similar with plenty of compute to spare. TinyML is a book that goes through the process of building a wake word model for such constrained environments. | ||