Remix.run Logo
sschueller 4 hours ago

I'm still looking for the "perfect" setup in order to clone my voice and use it locally to send voice replies in telegram via openclaw. Does anyone have auch a setup?

I want to be my own personal assistant...

EDIT: I can provide it a RTX 3080ti.

ilaksh 4 hours ago | parent | next [-]

You need to provide info on your hardware. Pocket-TTS does cloning on CPU, but for me randomly outputs something pretty weird sounding mixed in with like 90% good outputs. So it hasn't been quite stable enough to run without checking output. But maybe it depends on your voice sample.

Qwen 3 TTS is good for voice cloning but requires GPU of some sort.

bdbdbdb an hour ago | parent | prev | next [-]

Why not just send text replies? You can already do that

nicpottier 3 hours ago | parent | prev | next [-]

Try training a model on piper, you will need to record a lot of utterances but the results are pretty great and the output is a fast TTS model.

justanotherunit 4 hours ago | parent | prev [-]

Is it not just to train a model on your voice recordings and just use that to generate audio clips from text?