▲ | onion2k 3 days ago | |||||||||||||
Someone tried to generate a retro hip-hop album cover image with AI, but the text is all nonsense, and humans would have to be hired to clean that AI slop In about two years we've gone from "AI just generates rubbish where the text should be" to "AI spells things pretty wrong." This is largely down to generating a whole image with a textual element. Using a model like SDXL with a LORA like FOOOCUS to do inpainting and input image with a very rough approximation of the right text (added via MS Paint) you can get a pretty much perfect result. Give it another couple of years and the text generation will be spot on. So yes, right now we need a human to either use the AI well, or to fix it afterwards. That's how technology always goes - something is invented, it's not perfect, humans need to fix the outputs, but eventually the human input diminishes to nothing. | ||||||||||||||
▲ | zdragnar 3 days ago | parent | next [-] | |||||||||||||
> That's how technology always goes This is not how AI has ever gone. Every approach so far has either been a total dead end, or the underlying concept got pivoted into a simplified, not-AI tech. This new approach of machine learning content generation will either keep developing, or it will join everything else in the history of AI by hitting a point of diminishing to zero returns. | ||||||||||||||
| ||||||||||||||
▲ | vunderba 3 days ago | parent | prev [-] | |||||||||||||
Minor correction. FOOCUS [1] isn't a LoRA - it's a Gradio-based frontend (in the same vein as Automatic1111, Forge, etc.) for image generation. And most SOTA models (Imagen, Qwen 20b, etc) at this point can actually already handle a fair amount of text in a single T2I generation. Flux Dev provided your willing to roll a couple gens can do it as well. |