Remix.run Logo
doctorpangloss a day ago

I work in image diffusion rather than “LLMs.”

RAGs are the ControlNet of image diffusion. They exist for many reasons, some of those are that context windows are small, instruct-style frontier models haven’t been adequately trained on search tasks, and reason #1: people say they need RAGs so an industry sprouts up to give it to them.

Do we need RAGs? I guess for now yes, but in the near future no: 2/3 reasons will be solved by improvements to frontier models that are eminently doable and probably underway already. So let’s answer the question for controlnets instead to illuminate why just because someone asks for something, doesn’t mean it makes any sense.

If you’re Marc Andreesen and you call Mike Ovitz, your conversation about AI art generation is going to go like this: “Hollywood people tell me that they don’t want the AI to make creative decisions, they want AI for VFX or to make short TikTok videos or” something something, “the most important people say they want tools that do not obsolete them.” This trickles down to the lowly art director, who may have an art illustration background but who is already sending stuff overseas to be done in something that resembles functionally a dehumanized art generator. Everybody up and down this value chain has no math or English Lit background so to them, the simplest, most visual UX that doesn’t threaten their livelihood is what they want: Sketch To Image.

Does Sketch to image make sense? No. I mean it makes sense for people who cannot be fucked to do the absolutely minimal amount of lift to write prompts, which is many art professionals who, for the worse, have adopted “I don’t write” as an identity, not merely some technical skill specialization. But once you overcome this incredibly small obstacle of writing 25 words to Ideogram instead of 3 words to Stable Diffusion, it’s obvious: nobody needs to draw something and then have a computer finish it. Of course it’s technologically and scientifically tractable to have all the benefits of controlnets like, well control and consistency, but with ordinary text. But people who buy software want something that is finished, they are not waiting around for R&D projects. They want some other penniless creative to make a viral video using Ideogram or they want their investor’s daughter’s heir boyfriend who is their boss to shove it down their throats.

This is all meant to illustrate that you should not be asking people who don’t know anything what technology they want. They absolutely positively will say “faster horses.” RAGs are faster horses!