| ▲ | Nvidia Fugatto: "World's Most Flexible Sound Machine"(blogs.nvidia.com) |
| 73 points by microsoftedging 2 days ago | 47 comments |
| |
|
| ▲ | ahofmann 2 days ago | parent | next [-] |
| While this might be a technical breakthrough, none of the examples sounded any good. Every aspect of the provided sounds are bad. The music sounds muffled and badly mixed. The generated beat isn't a beat that grooves, or has anything interesting in it. The barking saxophone sounded just bad. The voices sounded somewhat convincing. In general I think that with ai generated audio it is much more noticeable how utterly bad everything is, that ai generates. I already absolutely hate the two AI voices that are in a lot of YouTube videos and are a reason for me to close the Video immediately most of the time. |
| |
| ▲ | leopoldj 2 days ago | parent | next [-] | | While I agree with you, this is the release of a research paper [1] and some accompanying demos on GitHub [2]. This is not a finished product fine tuned for high quality output. [1] https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.p... [2] https://fugatto.github.io/ | |
| ▲ | RobinL 2 days ago | parent | prev | next [-] | | With apologies for the X link, here is an example from Suno which felt very musical to me:
https://x.com/sunomusic/status/1857501332560818342 Here's another example on the Suno website:
https://suno.com/song/fc991b95-e4e9-4c8f-87e8-e5e4560755e7 | | |
| ▲ | ben_w 2 days ago | parent | next [-] | | There are things I like from Suno, but, having used it to make quite a lot, I also get vibes of something subtly wrong that I can't put my finger on, which I assume is somewhere between the audio version of bad kerning and Cronenberg fingers. Too many examples of vocoder/autotune in the training set, perhaps? That said, I mostly prefer AI over "real" human-made recordings (pop, classical, metal, bardcore, whatever) because I tend to learn the patterns too fast to enjoy, or really even tolerate, any recording more than about 3 times* — I assume I'd like live jazz for longer, but have only been to one place that ever had it so I don't know if it breaks that pattern. * sole exception: TV theme tunes, though the point of them isn't to listen to them | |
| ▲ | numpad0 2 days ago | parent | prev | next [-] | | I don't find any problems whatsoever in those audio, but I'm not an avid music listener, so out of intuition I'm making a guess that there's same underlying issue as image generation happening: AI makes technically horrible and rage-inducing fillers that lack high level semantic structure, but average people has no words nor experience to assess and describe what's going on. | | |
| ▲ | ahofmann 2 days ago | parent [-] | | > I don't find any problems whatsoever in those audio I think this is why there is no real, powerful protest against all that generated stuff. Only the people, that care, are able to articulate what's wrong with it. To me, all of AI generated content sounds horrible. To almost everyone else, this sounds ok. So we will see and hear more of this generated stuff. We are in the middle of the enshittification of all consumable media. | | |
| ▲ | ben_w 2 days ago | parent | next [-] | | I think there's a lot of different reasons all simultaneously going on. Most human musicians have very little power; that's been going away for a long time, even since "canned music" "robots" pushed live bands out of cinemas a century ago: https://www.smithsonianmag.com/history/musicians-wage-war-ag... Most popular music already feels, and to an extent is, fake. Not only because mere recording allows repeated takes until it's inhumanly "perfect". When I played an MP3 of Britney Spears to my mum around the turn of the century, she thought it was a robot singing because of the autotune. The Monkees was famously an attempt at a manufactured band whose members just happened to not feel like playing that game and did it for real, Gorillaz is even more obviously manufactured. Parasocial relationships are inherently different from "real" relationships, but the performers have to pretend that it's personal when they address a crowd or a camera. Axis of Awesome demonstrated the similarity of most modern hits with their "4 Chords": https://youtu.be/oOlDewpCfZQ?feature=shared Those with the power were, possibly still are, the record labels; but if the AI are trained on the works of small musicians that can't afford the copyright cases or the political influence, but also whose works are not under the umbrella of the labels who do have those resources but not the right or short term motivation to intercede on their behalf, the big labels themselves may lose the consumer market to free AI output, while professionals will dismiss both the AI output and the label's output as "just different kinds of slop but both slop" (or whatever the current insult de jour is for AI). | |
| ▲ | numpad0 2 days ago | parent | prev | next [-] | | Agreed. To me AI generated images look horrible, and AI generated audio is still somewhat gut twisting but less painful. AI generated code works for HTML/CSS+JS, but not that great for others. AI generated e-commerce reviews ... on par with human reviews? I'm starting to think that what AI might be replacing is high ends of consumption, not low ends of generation. Arts has followers that are often less historically significant than genre pioneering works. Doesn't that seem like what AI is doing? | |
| ▲ | com2kid 2 days ago | parent | prev | next [-] | | People were happy with the included wired iPhone earbuds for years, even though they were terrible. Listening on a laptop speaker Sumo sounds fine. Listening on my wireless earbuds it is... ok. I am too lazy anymore to pull out any of my high quality wired headphones, and if somebody who used to care about sound quality enough to purchase multiple HQ headphones can't be arsed, then the general pubic really is going to think everything is just fine. | | |
| ▲ | numpad0 2 days ago | parent [-] | | That kind of quality limitations is not the point. Nor extra digits in images. Generative AI outputs trigger uncanny valley discomfort that professionals and connoisseurs are better equipped to verbalize. The quality question is what's the point of stuffs like that, or is it good, even safe, for us to consume. |
| |
| ▲ | anonzzzies 2 days ago | parent | prev [-] | | I find almost all popular music that's made in the past 20+ years quite terrible. This is not worse. For people who enjoy this chewing gum stuff, which seems most of the population of earth, this is fine. And as such, this will be all popular music in the future; upload your voice, pick a style, generate 13 songs, go on tour to make money. | | |
| ▲ | codedokode 2 days ago | parent [-] | | > go on tour Why go on tour if you can send an AI singer instead and if you cannot sing as good as it anyway? | | |
| ▲ | anonzzzies 2 days ago | parent [-] | | They want to believe it is human; however when the robots get good enough... That's further away though maybe. |
|
|
|
| |
| ▲ | ahofmann 2 days ago | parent | prev [-] | | Suno is by far the best generated AI music I've heard. That said, it is hot garbage. I've listened to both songs on my Bose QC ultra headphones, which are far from perfect headphones. But even on them, the female voice has unbearable resonances in the higher frequencies. The male voice sounds mostly ok, but has also something that sounds like compression artifacts (like mp3 compression, not loudness compression). All instruments in these songs have these problems. They sound somewhat like the real thing, but really badly recorded. Also, the mixing isn't any good. It is still very impressive that AI can generate that. But if I would record my band and someone would create such a mix out of it, I would fire them immediately. Heck, I would be furious that they fucked up so bad and would try to get my money back. So the two links you provided just confirm what I said. | | |
| ▲ | CraftingLinks 2 days ago | parent | next [-] | | I use Suno like a producer in a music studio hires musicians to bring ideas to life. I wish more features in Suno would empower music producers. I sample pieces, re-mix doodles, get ideas to continue my tracks... I can see the future, and as an amateur, it's just liberating and a lot of fun. | |
| ▲ | snapcaster 2 days ago | parent | prev [-] | | Really interesting, haven't listened to their output with high quality speakers or anything like that. Do poorly made human recordings have this problem or is this currently a signal of AI generation? |
|
| |
| ▲ | codedokode 2 days ago | parent | prev [-] | | This might be because of dataset quality, because most of high-quality content is in commercial music and sample libraries. | | |
| ▲ | squarefoot 2 days ago | parent [-] | | This. And the world isn't ready for that, including copyright laws that must be radically changed in a way that doesn't harm innovation. Suno v4 has become a complete disaster for some genres, and that could be due to the lawsuit that is forcing them to retrain the model using non copyrighted works, which in my opinion is pure bollocks. Imagine forcing an artist to unlearn what they listened to in their young years and contributed to forge their personal style.
Sorry, but I'm pessimistic. If we don't change how copyright works, pretty much every development in the field will be ruined by greedy copyright holders and their lawyers as soon as it shows any capability to produce decent music that barely resembles something else. | | |
| ▲ | codedokode 2 days ago | parent | next [-] | | Should not the author be able to decide if his work may be used for generative AI? > Imagine forcing an artist to unlearn Mathematical models cannot learn. What happens in fact is the owner of generative AI takes a bunch of copyrighted works which took a lot of effort and money to produce (instruments, mics and other equipment is super expensive), puts it into computer and sells whatever the computer has calculated from those recording. Do you see any learning or any creativity here? There were cases when suno (or udio) was reproducing producer tags almost verbatim (but in lower quality) for example. This shows that the model was not simply calculating some probabilities of patterns of pitches, durations etc, but was storing the copyrighted content almost unmodified. Also, personally I have no interest in a service that generates a song for you because it takes away all the fun. Maybe something that helps to find mistakes in composed music and help learning would be much more useful. | | |
| ▲ | jojo_ 2 days ago | parent | next [-] | | Lots of artists can reproduce existing content, should we get rid of them entirely, or just restrict them from publishing such content? If anything, it's the responsibility of the publisher to avoid copyright infringement. > Also, personally I have no interest in a service that generates a song for you because it takes away all the fun. Maybe something that helps to find mistakes in composed music and help learning would be much more useful. You are not forced to use the full raw output, you could use it sparingly in your new composition, the same way you might use ChatGPT to improve your lyrics. All non-musician friends where thrilled to generate music. It's already extremely fun and will keep getting better. I think it lowers the barrier of entry and will increase the total amount of performers, the "real" musicians. I am sure musicians playing instruments back in the days had the same idea about digital music: "Not playing with the physical instruments takes away all the fun. You can't touch, smell, feel it. It has a negative impact on the music and on the people. I am so smart, I am a democrat, you guys are nazis, you want to destroy humanity while I want to restrict the majority of the people from having fun.". > lot of effort and money to produce
Mathematical formulas too, and you can't copyright them. If a new device is invented to replay memory "almost verbatim (but in lower quality) for example". Should its use be restricted with regard to copyrighted content? It's your memory, your unique interpretation, shouldn't it belong to you? AI will get better and you'll be able to easily go up the tree from which content was derived (intentionally or not) based on the similarity and the publication date. Artists don't need more protections than mathematicians. | |
| ▲ | squarefoot 2 days ago | parent | prev [-] | | > Do you see any learning or any creativity here? Of course not if we take it to the extreme, ie only copyrighted work reproduced almost identical, but I've used the platform with my own music and it reorganized it in a very interesting way, actually inspiring new songs and arrangements which I'll probably play with real instruments. I haven't the slightest interest in replicating top chart garbage; however lawsuits by major labels are ruining also the creative aspect where no copyrighted work is involved. Suno is now quite likely retraining their model only on free music because of the lawsuits, and despite the hype, for some genres last version turned out awful. |
| |
| ▲ | Arainach 2 days ago | parent | prev [-] | | >Sorry, but I'm pessimistic. If we don't change how copyright works, pretty much every development in the field will be ruined by greedy copyright holders and their lawyers Sorry, but I'm pessimistic. If we don't change how AI regulation works, pretty much every creative field will be ruined by greedy tech companies and their planet-burning plagiarism devices. |
|
|
|
|
| ▲ | SonOfLilit 2 days ago | parent | prev | next [-] |
| The description is amazing, but the demo video feels underwhelming. Available music generation models sound much more musical and have much better diction on vocals. |
| |
| ▲ | codedokode 2 days ago | parent [-] | | This might be due to quality of the dataset because Nvidia seems to be not using copyrighted commercial recordings (if I read their paper properly). It is difficult to compete with those who have used larger and higher quality dataset without permission. |
|
|
| ▲ | olau 2 days ago | parent | prev | next [-] |
| I would love to see a model focusing on making virtual instruments. There are sample-based virtual instruments, but they do miss some subtleties, and there are physics-based ones where some subtleties are preserved, but generally worse sounding because actually modelling real hardware evolved over centuries is really difficult. Even hardware-based instruments like electric guitars/violins/cellos etc. generally sound distinct from and less interesting than their acoustic counterparts. Electric guitar players seem to use various amplifier tricks to make up for that, and that's now a big separate instrument. But I think the point stands. |
| |
|
| ▲ | codedokode 2 days ago | parent | prev | next [-] |
| If you use it for work, AI might be ok, but generating a guitar or piano track is zero fun compared to playing a real instrument (even if AI track sounds better). I think we should not forget this part too. But what about an AI guitar that automatically frets the strings properly if you don't press them hard enough? Or an AI piano which shifts the keyboard when it sees that you are about to hit the wrong key? Many instruments require lot of practice before you can produce acceptable sound. Can AI help with this? |
| |
| ▲ | norir 2 days ago | parent [-] | | Not only do the instruments require practice to sound good (I've been playing electric bass for three years and am just beginning to sound better than bad), but a huge part of the process is learning to listen to the instrument and make adjustments. The beauty is that you can immediately hear the result of the adjustment. If it sounds better, you keep it. Otherwise you move until you get closer to what you're looking for. With a prompt based ai tool, it is not possible to make low latency adjustments. Even if you could, how would you articulate the subtle adjustment to the llm? My sense is that contrary to marketing, ai tools will be most useful to people who already have musical skill and will actively subvert musical development in most people who rely on it too early in their process. |
|
|
| ▲ | ZoomZoomZoom 2 days ago | parent | prev | next [-] |
| Most of audio and music AI have wrong incentives and are moving in a different direction professionals need. Almost all publicized innovations in the sphere are complex one-stop-shop solutions which aim to completely replace as many members of the creative process as possible. It's a corporate dream: a thing that spews barely-passable, generic mush that's totally aligned with demands of the decision makers, but has zero opinion, zero ambition, zero professional pride and no need to uphold its ethical and aesthetic standards and its own reputation whatsoever. Instead of tools for the creatives we have systems that generate complete tracks from tinder chat logs. On the other hand, there's still no publicly available audio style transfer with even remotely usable quality (that thing from Google is abysmal). All I want for starters is something that turns a slightly distorted, over-reverberated and not-perfectly intonated flute recording that a client sends me into a clean workable track. I don't even ask for it to turn it into koto or marimba or whatever you think is a cool demonstration case! Sorry for the rant, but it's all very frustrating and alarming. |
| |
| ▲ | tgv 2 days ago | parent [-] | | Barely passable, indeed. And then to imagine the MBAs are indeed going to fire staff and downsize contractors because of this. More money for them: is that the incentive here? |
|
|
| ▲ | SushiHippie 2 days ago | parent | prev | next [-] |
| Anyone knows what melody this is at 2:07 in the Video? https://youtu.be/qj1Sp8He6e4?t=2m7s |
|
| ▲ | ecocentrik 2 days ago | parent | prev | next [-] |
| Is it being trained on noticeably compressed audio or is it just outputting highly compressed audio? Can someone explain what the benefits of either would be outside of specifically asking for the sound of audio compression artifacts? Like others have pointed out, existing generative music services already output much higher fidelity audio. |
|
| ▲ | camillomiller 2 days ago | parent | prev | next [-] |
| Another day, another model made by engineers who think their technical prowess needs absolutely no understanding of the subtlety of human creativity |
|
| ▲ | olup 2 days ago | parent | prev | next [-] |
| They say the models are under 3b parameters. If only for voice generation it sounds pretty good, no ? |
|
| ▲ | varispeed 2 days ago | parent | prev | next [-] |
| Please don't put headphones over a cat's head and especially don't play any loud music! |
| |
| ▲ | ben_w 2 days ago | parent [-] | | Mm. The headphones aren't on the ears — a surprisingly common error even in pre-AI human-made cartoons. | | |
|
|
| ▲ | pil0u 2 days ago | parent | prev | next [-] |
| Am I the only one feeling weird about the image they chose to illustrate the article? I'm not a professional in that field but I would probably feel offended if coding assistants were presented with a monkey in front of a computer. |
| |
| ▲ | dagw 2 days ago | parent | next [-] | | Calling a musician a "cool cat" has been a slang term of high praise for jazz musicians since at least the 50s. | |
| ▲ | gloflo 2 days ago | parent | prev | next [-] | | It's low quality AI slob. That's already offensively disrespectful to the readers on its own. | |
| ▲ | Cumpiler69 2 days ago | parent | prev | next [-] | | >I would probably feel offended if coding assistants were presented with a monkey in front of a computer As a professional monkey in front of a computer, I feel offended. | |
| ▲ | throw310822 2 days ago | parent | prev | next [-] | | Uh? Apart from the fact that the symbolism between a monkey and a cat is entirely different, I imagined it was because gato/ gatto means cat in Spanish/ Italian. | | |
| ▲ | Klaster_1 2 days ago | parent [-] | | Same in greek! I was pleasantly surprised to see a cat, I think this was a nice touch. |
| |
| ▲ | gus_massa 2 days ago | parent | prev | next [-] | | Nah. Cats are cool. S/he has a smart cool look. Most people would like it. (Not everyone.) Disclainmer: I prefer dogs, they are more friendly and even part of the family, but I have to recognize that cats look more cool. | |
| ▲ | ipsum2 2 days ago | parent | prev [-] | | [flagged] |
|
|
| ▲ | popalchemist 2 days ago | parent | prev [-] |
| Will they be releasing weights? |