| ▲ | armcat 18 hours ago |
| Super nice! I've been using Kokoro locally, which is 82M parameters and runs (and sounds) amazing! https://huggingface.co/hexgrad/Kokoro-82M |
|
| ▲ | machiaweliczny 16 hours ago | parent | next [-] |
| BTW does anyone know of good assistant voice stack that's Open Source? I used https://github.com/ricky0123/vad for voice activation -> works good, then just using Web Speech API as that's the fastest and then commercial TTS for speed as couldn't find good one. |
|
| ▲ | machiaweliczny 17 hours ago | parent | prev [-] |
| I tried Kokoro-JS that I think runs in browser and it was too way too slow with latency also not supporting language I wanted |
| |
| ▲ | armcat 10 hours ago | parent [-] | | I have a 5070 in my rig. What I'm running is Kokoro in a Python/FastAPI backend - I also use local quantized models (I swap between ministral-3 and Qwen3) as "the brains" (offload to GPT-5.2 inc. web search for "complex" tasks or those requiring the web). In the backend I use Kokoro and generate wav bytes that I send to the frontend. The frontend is just a simple HTML page with a textbox and a button, invoking a `fetch()`. I type, and it responds back in audio. The round-trip time is <1 second for me, unless it needs to call OpenAI API for "complex" tasks. I am yet to integrate STT as well and then the cycle is complete. That's the stack, and not slow at all, but it depends on your HW. |
|