Remix.run Logo
roenxi 3 hours ago

More even, something AI is quite bad at is combining different imaginative elements.

Compare the AI dinosaur in the article to the commissioned dinosaur. The commission has a vibe created by the eye expression and the glasses. I'd maybe call it chill. The thumb-up is present but it isn't leading the vibe, we might infer that it is something the dino is doing because he is chill. The gesture is only a tiny part of the image, almost an afterthought.

In the original AI image the dinosaur has its thumb up and seems to be really happy. Big smile, relaxed face. Thumb looms large in the foreground. That would be totally normal for this sort of prompt, I don't expect the AIs to have a lot of thoughtful variety on body language.

So what is interesting is getting the AI to generate the commission image - one where the thumb-up looks like a natural consequence of a broader scene - is actually quite hard. The prompter needs to think about all those details of what the character of the dinosaur is and such that make the gesture natural. It might be too hard to one-shot prompt. Image generators don't do that the last time I checked, they just provide what is asked for. Human artists (especially the good ones) will identify that as boring and start adding flourishes to keep people's interest.

People end up hoist on their own petard. "A T-Rex giving a thumbs up" isn't an interesting idea and a good human artist will - instead of following an instruction - give people what they asked for and slip some actually interesting elements in, which usually comes back to more body language and facial expression that is hard to describe.

meander_water 2 hours ago | parent | next [-]

Agree. All of the major AI model labs have designed their user interfaces in entirely the wrong way.

Prompting via text alone is a really bad way to generate images. Ideally you want Canny Control to draw an outline of the image with elements in the exact locations where you want them. It's why comfyui is so great.

The ability to edit images and specify regions in the image for the prompt is a step in the right directions though. ChatGPT and Gemini have this.

emccue 3 hours ago | parent | prev [-]

(do give the artist more commissions. They need to eat and are on my shortlist for stuff like this. Here's a sexy Jar-Jar Binks/Garfield hybrid they made https://bsky.app/profile/dsoart.com/post/3ml2f4aqsf22t)