Remix.run Logo
eterm 6 days ago

I have a "go to" prompt for images:

> In the style of a 1970s book sci-fi novel cover: A spacer walks towards the frame. In the background his spaceship crashed on an icy remote planet. The sky behind is dark and full of stars.

Nano banana pro via gemini did really well, although still way too detailed, and it then made a mess of different decades when I asked it to follow up: https://gemini.google.com/share/1902c11fd755

It's therefore really disappointing that GPT-image 1.5 did this:

https://chatgpt.com/share/6941ed28-ed80-8000-b817-b174daa922...

Completely generic, not at all like a book cover, it completely ignored that part of the prompt while it focused on the other elements.

Did it get the other details right? Sure, maybe even better, but the important part it just ignored completely.

And it's doing even worse when I try to get it to correct the mistake. It's just repeating the same thing with more "weathering".

bongodongobob 6 days ago | parent [-]

You're just not describing what you want properly. Looks fine to me. Clearly you have something else in mind, so I think you're just not describing well. My tip would be to use actuall illustration language. Do you want a wide angle shot? What should depth of field be? Oil painting print? Ink illustration? What kind of printing style? Do you want a photo of the book or a pre-print proof? What kind of color scheme?

A professional artist wouldn't know what you want.

You didn't even specify an art style. 1970s sci-fi novel cover isn't a style. You'll find vastly different art styles from the 70s. If you're disappointed, it's because you're doing a shitty job describing what's in your head. If your prompt isn't at least a paragraph, you're going to just get random generic results.

eterm 6 days ago | parent [-]

The killer feature of LLMs is to be able to extrapolate what's really wanted from short descriptions.

Look again at Gemini's output, it looks like an actual book cover, it looks like an illustration that could be found on a book.

It takes on board corrections (albeit hilariously literaly).

Look at GPT image's output, it doesn't look anything like a book cover, and when prompted to say it got it wrong, just doubles down on what it was doing.

bongodongobob 5 days ago | parent [-]

What you want, and what you think image generation is, is impossible.

eterm 5 days ago | parent [-]

And yet we can see Gemini do what I wanted, so it's clearly not impossible.

bongodongobob 5 days ago | parent [-]

What you've found is a prompt that returns what you want on Gemeni. That's all.

eterm 5 days ago | parent [-]

It's a prompt I've been using for years. Gemini has been the best of the bunch, but Nana Banana, midjourney, etc, all did okay to various degrees.

GPT Image bombed notably worse than the others, not the original picture itself, but the complete lack of recognition of my feedback that it hadn't got it right, it just doubled down on the image it had generated.