Remix.run Logo
meander_water 3 hours ago

I think one of the reasons for sloppy images is that non-artistic people don't have the vocabulary to describe images to be produced in interesting styles.

Yes, you can do image-> text on existing styles, but something always gets lost in translation.

Midjourney probably has the best baseline, and --sref is a really easy way to differentiate

roenxi 3 hours ago | parent | next [-]

More even, something AI is quite bad at is combining different imaginative elements.

Compare the AI dinosaur in the article to the commissioned dinosaur. The commission has a vibe created by the eye expression and the glasses. I'd maybe call it chill. The thumb-up is present but it isn't leading the vibe, we might infer that it is something the dino is doing because he is chill. The gesture is only a tiny part of the image, almost an afterthought.

In the original AI image the dinosaur has its thumb up and seems to be really happy. Big smile, relaxed face. Thumb looms large in the foreground. That would be totally normal for this sort of prompt, I don't expect the AIs to have a lot of thoughtful variety on body language.

So what is interesting is getting the AI to generate the commission image - one where the thumb-up looks like a natural consequence of a broader scene - is actually quite hard. The prompter needs to think about all those details of what the character of the dinosaur is and such that make the gesture natural. It might be too hard to one-shot prompt. Image generators don't do that the last time I checked, they just provide what is asked for. Human artists (especially the good ones) will identify that as boring and start adding flourishes to keep people's interest.

People end up hoist on their own petard. "A T-Rex giving a thumbs up" isn't an interesting idea and a good human artist will - instead of following an instruction - give people what they asked for and slip some actually interesting elements in, which usually comes back to more body language and facial expression that is hard to describe.

meander_water 2 hours ago | parent | next [-]

Agree. All of the major AI model labs have designed their user interfaces in entirely the wrong way.

Prompting via text alone is a really bad way to generate images. Ideally you want Canny Control to draw an outline of the image with elements in the exact locations where you want them. It's why comfyui is so great.

The ability to edit images and specify regions in the image for the prompt is a step in the right directions though. ChatGPT and Gemini have this.

emccue 3 hours ago | parent | prev [-]

(do give the artist more commissions. They need to eat and are on my shortlist for stuff like this. Here's a sexy Jar-Jar Binks/Garfield hybrid they made https://bsky.app/profile/dsoart.com/post/3ml2f4aqsf22t)

tokioyoyo 3 hours ago | parent | prev [-]

Sure, but let me flip the question - how would the user react if they knew the said prompt-crafted image is AI generated? Industry will need to do better to sell it to the young generation, which is usually the tastemaker for the future. It is considered "low class" to use AI-generated images. If the game is "conceal that it was AI-generated", then... lol.

emccue 3 hours ago | parent [-]

Yeah people seem to think that the issue is that the output isn't "high quality enough," which is a super strange misconception about the role of art even in a commercial setting. Like if it just gets "good" in some mechanical way that people will start to like it.