| ▲ | sschueller 4 hours ago | |
I'm still looking for the "perfect" setup in order to clone my voice and use it locally to send voice replies in telegram via openclaw. Does anyone have auch a setup? I want to be my own personal assistant... EDIT: I can provide it a RTX 3080ti. | ||
| ▲ | ilaksh 4 hours ago | parent | next [-] | |
You need to provide info on your hardware. Pocket-TTS does cloning on CPU, but for me randomly outputs something pretty weird sounding mixed in with like 90% good outputs. So it hasn't been quite stable enough to run without checking output. But maybe it depends on your voice sample. Qwen 3 TTS is good for voice cloning but requires GPU of some sort. | ||
| ▲ | bdbdbdb an hour ago | parent | prev | next [-] | |
Why not just send text replies? You can already do that | ||
| ▲ | nicpottier 3 hours ago | parent | prev | next [-] | |
Try training a model on piper, you will need to record a lot of utterances but the results are pretty great and the output is a fast TTS model. | ||
| ▲ | justanotherunit 4 hours ago | parent | prev [-] | |
Is it not just to train a model on your voice recordings and just use that to generate audio clips from text? | ||