| ▲ | dfajgljsldkjag 5 hours ago | |
It requires a bit of tinkering, but I think pipecat is the way to go. You can plug in pretty much any STT/LLM/TTS you want and go. It definitely supports local models but its up to you to get your hands on those models. Not sure if there's any turnkey setups that are preconfigured for local install where you can just press play and go though. Last I heard E2E speech to speech models are still pretty weak. I've had pretty bad results from gpt-realtime and that's a proprietary model, I'm assuming open source is a bit behind. | ||