| ▲ | gagan2020 2 days ago | |
It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…". I really disappointed with this model to say the least. | ||
| ▲ | Stagnant 2 days ago | parent | next [-] | |
The 7B parameter Vibevoice TTS model is still the most impressive local TTS model i've tried. It was pulled by Microsoft a few days after its release due to "abuse potential" but it can be found in various community maintained huggingface repos. | ||
| ▲ | tjungblut 2 days ago | parent | prev [-] | |
yep, it seems this was trained on large amount of podcasts with ad jingles or phone call queues with elevator music. I was also pretty disappointed to run the TTS last week. | ||