Remix.run Logo
renewiltord 10 hours ago

What’s the current state of the art in low power wake word and speech to text? Has anyone written a blog post on this?

I was able to run a speech to text on my old Pixel 4 but it’s a bit flaky (the background process loses the audio device occasionally). I just want to take some wake word and then send everything to remote LLM and then get back text that I do TTS on.

geerlingguy 10 hours ago | parent | next [-]

Maybe not SOTA but the HA Voice Preview Edition [1] in tandem with a Pi 5 or some similar low-power host for the Piper / Whisper pipeline is pretty good. I don't use it but was able to get an Alexa/Google Home-like experience going with minimal effort.

I was only using it for local Home Assistant tasks, didn't try anything further like retrieving sports scores, managing TODO lists, or anything like that.

[1] https://www.home-assistant.io/voice-pe/

folmar 9 hours ago | parent | prev | next [-]

Wake word is not expensive, you can do it on esp32 https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3... (and then send audio to something more beefy as TTS will be marginal at best).

monocasa 10 hours ago | parent | prev [-]

Wake word can be tiny. Like 10k weights and can run on an esp32 or similar with plenty of compute to spare.

TinyML is a book that goes through the process of building a wake word model for such constrained environments.