Remix.run Logo
Qwen3-TTS Family Is Now Open Sourced: Voice Design, Clone, and Generation(qwen.ai)
173 points by Palmik 4 hours ago | 40 comments
PunchyHamster 4 minutes ago | parent | next [-]

Looking forward for my grandma being scammed by one!

throwaw12 2 hours ago | parent | prev | next [-]

Qwen team, please please please, release something to outperform and surpass the coding abilities of Opus 4.5.

Although I like the model, I don't like the leadership of that company and how close it is, how divisive they're in terms of politics.

mortsnort an hour ago | parent | next [-]

They were just waiting for someone in the comments to ask!

mhuffman 26 minutes ago | parent [-]

It really is the best way to incentivize politeness!

TylerLives an hour ago | parent | prev | next [-]

>how divisive they're in terms of politics

What do you mean by this?

throwaw12 an hour ago | parent [-]

Dario said not nice words about China and open models in general:

https://www.bloomberg.com/news/articles/2026-01-20/anthropic...

vlovich123 an hour ago | parent | next [-]

I think the least politically divisive issue within the US is concern about China’s growth as it directly threatens the US’s ability to set the world’s agenda. It may be politically divisive if you are aligned with Chinese interests but I don’t see anything politically divisive for a US audience. I expect Chinese CEOs speak in similar terms to a Chinese audience in terms of making sure they’re decoupled from the now unstable US political machine.

giancarlostoro 6 minutes ago | parent | prev | next [-]

From the perspective of competing against China in terms of AI the argument against open models makes sense to me. It’s a terrible problem to have really. Ideally we should all be able to work together in the sandbox towards a better tomorrow but thats not reality.

I prefer to have more open models. On the other hand China closes up their open models once they start to show a competitive edge.

Levitz 9 minutes ago | parent | prev [-]

I mean, there's no way it's about this right?

Being critical of favorable actions towards a rival country shouldn't be divisive, and if it is, well, I don't think the problem is in the criticism.

Also the link doesn't mention open source? From a google search, he doesn't seem to care much for it.

WarmWash an hour ago | parent | prev | next [-]

The Chinese labs distill the SOTA models to boost the performance of theirs. They are a trailer hooked up (with a 3-6 month long chain) to the trucks pushing the technology forwards. I've yet to see a trailer overtake it's truck.

China would need an architectural breakthrough to leap American labs given the huge compute disparity.

miklosz 23 minutes ago | parent | next [-]

I have seen indeed a trailer overtake its truck. Not a beautiful view.

aaa_aaa 37 minutes ago | parent | prev | next [-]

No all they need is time. I am awaiting the dowfall of the ai hegemony and hype with popcorn at hand.

mhuffman 24 minutes ago | parent | prev [-]

I would be happy with an openweight 3 month old Claude

amrrs an hour ago | parent | prev | next [-]

Have you tried the new GLM 4.7?

davely 29 minutes ago | parent | next [-]

I've been using GLM 4.7 alongside Opus 4.5 and I can't believe how bad it is. Seriously.

I spent 20 minutes yesterday trying to get GLM 4.7 to understand that a simple modal on a web page (vanilla JS and HTML!) wasn't displaying when a certain button was clicked. I hooked it up to Chrome MCP in Open Code as well.

It constantly told me that it fixed the problem. In frustration, I opened Claude Code and just typed "Why won't the button with ID 'edit' work???!"

It fixed the problem in one shot. This isn't even a hard problem (and I could have just fixed it myself but I guess sunk cost fallacy).

bityard 11 minutes ago | parent [-]

I've used a bunch of the SOTA models (via my work's Windsurf subscription) for HTML/CSS/JS stuff over the past few months. Mind you, I am not a web developer, these are just internal and personal projects.

My experience is that all of the models seem to do a decent job of writing a whole application from scratch, up to a certain point of complexity. But as soon as you ask them for non-trivial modifications and bugfixes, they _usually_ go deep into rationalized rabbit holes into nowhere.

I burned through a lot of credits to try them all and Gemini tended to work the best for the things I was doing. But as always, YMMV.

KolmogorovComp a few seconds ago | parent [-]

Exactly the same feedback

throwaw12 an hour ago | parent | prev [-]

yes I did, not on par with Opus 4.5.

I use Opus 4.5 for planning, when I reach my usage limits fallback to GLM 4.7 only for implementing the plan, it still struggles, even though I configure GLM 4.7 as both smaller model and heavier model in claude code

Onavo 23 minutes ago | parent | prev | next [-]

Well DeepSeek V4 is rumored to be in that range.

sampton an hour ago | parent | prev [-]

Every time Dario opens his mouth it's something weird.

genewitch an hour ago | parent | prev | next [-]

it isn't often that tehcnology gives me chills, but this did it. I've used "AI" TTS tools since 2018 or so, and i thought the stuff from two years ago was about the best we were going to get. I don't know the size of these, i scrolled to the samples. I am going to get the models set up somewhere and test them out.

Now, maybe the results were cherrypicked. i know everyone else who has released one of these cherrypicks which to publish. However, this is the first time i've considered it plausible to use AI TTS to remaster old radioplays and the like, where a section of audio is unintelligible but can be deduced from context, like a tape glitch where someone says "HEY [...]LAR!" and it's an episode of Yours Truly, Johnny Dollar...

I have dozens of hours of audio of like Bob Bailey and people of that era.

kamranjon 32 minutes ago | parent [-]

I wonder if it was trained on anime dubs cause all of the examples I listened to sounded very similar to a miyazaki style dub.

simonw 18 minutes ago | parent | prev | next [-]

If you want to try out the voice cloning yourself you can do that an this Hugging Face demo: https://huggingface.co/spaces/Qwen/Qwen3-TTS - switch to the "Voice Clone" tab, paste in some example text and use the microphone option to record yourself reading that text - then paste in other text and have it generate a version of that read using your voice.

javier123454321 3 minutes ago | parent [-]

This is terrifying. With this and z-image-turbo, we've crossed a chasm. And a very deep one. We are currently protected by screens, we can, and should assume everything behind a screen is fake unless rigorously (and systematically, i.e. cryptographically) proven otherwise. We're sleepwalking into this, not enough people know about it.

satvikpendem 12 minutes ago | parent | prev | next [-]

This would be great for audiobooks, some of the current AI TTS still struggle.

rahimnathwani 18 minutes ago | parent | prev | next [-]

Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.

javier123454321 8 minutes ago | parent [-]

I recommend using modal for renting the metal.

thedangler an hour ago | parent | prev | next [-]

Kind of a noob, how would I implement this locally? How do I pass it audio to process. I'm assuming its in the API spec?

dust42 an hour ago | parent [-]

Scroll down on the Huggingface page, there are code examples and also a link to github: https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base

ideashower an hour ago | parent | prev | next [-]

Huh. One of the English Voice Clone examples features Obama.

indigodaddy an hour ago | parent | prev | next [-]

How does the cloning compare to pocket TTS?

albertwang 2 hours ago | parent | prev | next [-]

great news, this looks great! is it just me, or do most of the english audio samples sound like anime voices?

reactordev an hour ago | parent | next [-]

The real value I see is being able to clone a voice and change timbre and characteristics of the voice to be able to quickly generate voice overs, narrations, voice acting, etc. It's superb!

rapind 2 hours ago | parent | prev | next [-]

> do most of the english audio samples sound like anime voices?

100% I was thinking the same thing.

devttyeu 2 hours ago | parent | prev | next [-]

Also like some popular youtubers and popular speakers.

pixl97 an hour ago | parent [-]

Hmm, wonder where they got their training data from?

thehamkercat an hour ago | parent | prev | next [-]

even the Japanese audio samples sound like anime

htrp an hour ago | parent | prev [-]

subbed audio training data (much better than cc data) is better

wahnfrieden 39 minutes ago | parent | prev | next [-]

How is it for Japanese?

lostmsu 2 hours ago | parent | prev [-]

I still don't know anyone who managed Qwen3-Omni to work properly on a local machine.