Remix.run Logo
steinvakt2 2 days ago

This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.

Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.

terbo 2 days ago | parent | next [-]

It has some perks, is a bit more expressive in some cases, but overall is trained on really noisy data, uses more memory, and isn't that fast - I'm talking about the (7b?) version that they released then removed quickly (vibevoice-community on github) - I still use chatterbox turbo and sometimes qwen TTS.

lblock 2 days ago | parent | prev | next [-]

Yeah, I don't get why it is suddenly getting so much attention today, it is all over twitter too

xnx 2 days ago | parent | next [-]

Simonw (who has a bit of a Midas touch for posts here) just posted about it https://simonwillison.net/2026/Apr/27/vibevoice/

realty_geek 2 days ago | parent [-]

To be fair, his Midas touch is a result of consistency and a lot of hard work.

It's like the gardener at one of the Oxford colleges said - it's really easy to create these perfect lawns, just turn up every day and trim and water it - for a couple hundred years.

soperj 2 days ago | parent [-]

I thought they rolled it as well?

ffsm8 2 days ago | parent [-]

As always with people: listen to what they say, not to what they do...

After all, they rarely do what they say themselves, so it's surely not entirely made up nonsense!

GuinansEyebrows 2 days ago | parent | prev | next [-]

there is so much more subversive marketing out there than any of us can really fathom. i try not to be too paranoid but it's getting a lot harder every day.

i know someone who worked in what we might call the 'astroturfing' space within the entertainment industry. after having a few discussions with him and with things like this[0] becoming more known, it's really difficult to afford any assumption of organic intent when money is on the line - especially at the scale that microsoft works at compared to something as comparatively quaint as the music industry.

[0] https://www.wired.com/story/geese-chaotic-good-marketing-ind...

ramon156 2 days ago | parent | prev [-]

well duh, they updated the news section

https://github.com/microsoft/VibeVoice/commit/e73d1e17c3754f...

which is microsoft for "we removed two dead links". AI innovation knows no limits!

Vinnl 2 days ago | parent [-]

Interestingly that seems to be in response to [1], which might indeed be the trigger for this.

[1] https://doublepulsar.com/microsoft-vibing-capturing-screensh...

gagan2020 2 days ago | parent | prev | next [-]

It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…".

I really disappointed with this model to say the least.

Stagnant 2 days ago | parent | next [-]

The 7B parameter Vibevoice TTS model is still the most impressive local TTS model i've tried. It was pulled by Microsoft a few days after its release due to "abuse potential" but it can be found in various community maintained huggingface repos.

tjungblut 2 days ago | parent | prev [-]

yep, it seems this was trained on large amount of podcasts with ad jingles or phone call queues with elevator music. I was also pretty disappointed to run the TTS last week.

narrationbox 2 days ago | parent | prev | next [-]

Yes, the SOTA is currently much more advanced.

steinvakt2 a day ago | parent [-]

What do you consider to be SOTA?

zuzululu 2 days ago | parent | prev | next [-]

you saved us a lot of time here.... i unstarred the repo

moving on....

Capricorn2481 2 days ago | parent [-]

I don't really pay attention to stars. Do people use them as bookmarks? Why would you star a repo if you knew so little about it?

drusepth 2 days ago | parent | next [-]

Stars for me are basically "this might be interesting but I don't have time to look at it now, hopefully I'll think about it later and give it a second look".

einsteinx2 2 days ago | parent | prev | next [-]

I exclusively use stars as bookmarks which is why I always found it strange when people talked about lots of stars meaning high quality or trustworthy…I’ve learned since then that I’m probably in the minority (both in using stars as bookmarks and not caring about how many stars a repo has).

tombert 2 days ago | parent | prev [-]

Judging by how many people apparently are paying bots to give their lazily vibe-coded repos thousands of stars, it seems like people both simultaneously take stars seriously while not taking them seriously at all. It breaks my brain.

Tamatarr 2 days ago | parent | prev | next [-]

Saved a lot of my time thanks!

scotty79 2 days ago | parent | prev | next [-]

You just saved me an afternoon.

tombert 2 days ago | parent | prev | next [-]

I'm shocked, shocked to find that Microsoft takes credit for a slow, unoriginal product that doesn't actually do what it advertises.

logicchains 2 days ago | parent [-]

Imagine the balls it took to willingly attach the Microsoft label to the front of the product that is Teams.

tombert 2 days ago | parent [-]

I mean the same can be said about most versions of Windows as well. People act like Windows 11 is where it all went sour, but I've personally kind of hated it since Windows XP.

I feel like a recurring pattern with Microsoft is to create something quickly, market it aggressively and push for everyone to use it immediately, and only once it is installed everywhere do people suddenly realize how terrible it is, but it's too late to change.

NBJack 2 days ago | parent [-]

I'm surprised you picked XP as the falling point. I didn't enjoy the days of reinstalling 95/98/ME every 6 months to avoid driver weirdness and seemingly random failures. XP was built on the foundation of 2000, which tended to make it more robust vs. its predecessors.

Vista on the other hand...

tombert 2 days ago | parent [-]

I mean, part of it is that I really hated the Fisher Price look to it, but it was also the first time I ever felt like I had to "hack" things to make stuff work. I had to muck with registry keys. Oh, and it was the first time that I noticed that Windows repair tools do not work.

I suspect I might have hated 9x more but I was pretty young when they came out and I didn't really "get into" computers until XP, and I disliked it enough to dual-boot Linux as a twelve year old.

SecretDreams 2 days ago | parent | prev [-]

[flagged]

NobleLie 2 days ago | parent [-]

The nuance is lost on LLM agentic dominant partakers.