Remix.run Logo
guerrilla a day ago

I don't understand the comments here at all. I played the audio and it sounds absolutely horrible, far worse than computer voices sounded fifteen years ago. Not even the most feeble minded person would mistake that as a human. Am I not hearing the same thing everyone else is hearing? It sounds straight up corrupted to me. Tested in different browsers, no difference.

sammyyyyyyy a day ago | parent | next [-]

As I said, some reference voices can lead to bad voice quality. But if it sounds that bad, it’s probably not it. Would love to dig into it if you want

codefreakxff a day ago | parent | next [-]

I agree with the comment above. I have not logged into hacker news in _years_ but did so today just to weigh in here. If people are saying that the audio sounds great, then there is definitely something going on with a subset of users where we are only hearing garbled words with a LOT of distortion. This does not sound like natural speech to met at all. It sounds more like a warped cassette tape. And I do not mean to slight your work at all. I am actually incredibly puzzled here to understand why my perception of this is so radically different from others!

guerrilla a day ago | parent [-]

Thank you for commenting. I wonder if this could be another situation like "the dress" (2015) or maybe something is wrong with our codecs...

Mashimo 21 hours ago | parent [-]

No, nothing wrong with your codecs. It's sounds shitty. But given the small size and speed it's still impressive.

It's like saying .kkrieger looks like a bad game, which it does, but then again .kkrieger is only 96kb or whatever.

guerrilla 21 hours ago | parent [-]

How big are TTS models like this usually?

.kkrieger looks like an amazing game for the mid-90s. It's incomprehensible that it's only 96kb.

Mashimo 19 hours ago | parent [-]

Here is an overview: https://www.inferless.com/learn/comparing-different-text-to-...

Also keep in mind the processing time. The ^ article above used a NVIDIA L4 with 24-GB VRAM. Sopro claims 7.5 second processing time on CPU for 30 seconds of audio!

If you want to get real good quality TTS, you should check out elevenlabs.io

Different tools for different goals.

guerrilla a day ago | parent | prev [-]

I mean I'm talking about the mp4. How could people possibly be worried about scammers after listening to that?

sammyyyyyyy a day ago | parent | next [-]

I didn’t specially cherry pick those examples. You can try it anyway for yourself. But thanks for the feedback anyway

guerrilla a day ago | parent [-]

No shade on you. It's definitely impressive. I just didn't understand people's reactions.

jrmg 17 hours ago | parent | prev [-]

It sounds like someone using an electrolarynx to me.

foolserrandboy a day ago | parent | prev | next [-]

I thought it was RFK

serf a day ago | parent [-]

spasmodic dysphonia as a service.

wildmXranat 16 hours ago | parent | prev | next [-]

Yes, if this selected piece is the best that was available to be used as a showcase, it's immediately off putting in distortion and mangling of pronunciation.

jackyysq a day ago | parent | prev | next [-]

same here, tried few different voices including my kids and my own, the generated audio is not similar at all, it's not even a proper voice

eigenvalue 14 hours ago | parent | prev [-]

Thank you, I was scrolling and scrolling in utter disbelief. It sounds absolutely dreadful. Would drive me nuts to listen to for more than a minute.