Remix.run Logo
raincole 20 hours ago

ChatGPT's prompt adherence is light years ahead of all the others. I won't even call Flux/Midjoueny its competitors. ChatGPT image gen is practically a one-of-its-kind unique product on the market: the only usable AI image editor for people without image editing experience.

I think in terms of image generation, ChatGPT is the biggest leap since Stable Diffusion's release. LoRA/ControlNet/Flux are forgettable in comparison.

thegeomaster 19 hours ago | parent | next [-]

Well, there's also gemini-2.0-flash-exp-image-generation. Also autoregressive/transfusion based.

Yiling-J 15 hours ago | parent | next [-]

gemini-2.0-flash-exp-image-generation doesn’t perform as well as GPT-4o's image generation, as mentioned in section 5.1 of this paper: https://arxiv.org/pdf/2504.02782. However based on my test, for certain types of images such as realistic recipe images, the results are quite good. You can see some examples here: https://github.com/Yiling-J/tablepilot/tree/main/examples/10...

thefourthchime 18 hours ago | parent | prev | next [-]

Such a good name....

raincole 12 hours ago | parent | prev | next [-]

It's quite bad now, but I have no doubt that Google will catch up.

The AI field looks awfully like {OpenAI, Google, The Irrelevent}.

yousif_123123 17 hours ago | parent | prev | next [-]

It's also good but clearly not close still. Maybe Gemini 2.5 or 3 will have better image gen.

swyx 13 hours ago | parent | prev [-]

> transfusion based.

what is that?

echelon 15 hours ago | parent | prev | next [-]

I'd go out on a limb and say that even your praise of gpt-image-1 is underselling its true potential. This model is as remarkable as when ChatGPT first entered the market. People are sleeping on its capabilities. It's a replacement for ComfyUI and potentially most of Adobe in time.

Now for the bad part: I don't think Black Forest Labs, StabilityAI, MidJourney, or any of the others can compete with this. They probably don't have the money to train something this large and sophisticated. We might be stuck with OpenAI and Google (soon) for providing advanced multimodal image models.

Maybe we'll get lucky and one of the large Chinese tech companies will drop a model with this power. But I doubt it.

This might be the first OpenAI product with an extreme moat.

raincole 13 hours ago | parent [-]

> Now for the bad part: I don't think Black Forest Labs, StabilityAI, MidJourney, or any of the others can compete with this.

Yeah. I'm a tad sad about it. I once thought the SD ecosystem proves open-source won when it comes to image gen (a naive idea, I know). It turns out big corps won hard in this regard.

soared 19 hours ago | parent | prev [-]

This is a take so incredulous it doesn’t seem credible.

stavros 19 hours ago | parent | next [-]

I can confirm, ChatGPT's prompt adherence is so incredibly good, it gets even really small details right, to a level that diffusion-based generators couldn't even dream of.

mediaman 19 hours ago | parent | prev | next [-]

It is correct, the shift from diffusion to transformers is a very, very big difference.

abhpro 16 hours ago | parent | prev | next [-]

Also chiming in to say you're wrong, I mean they're correct

tacoooooooo 19 hours ago | parent | prev [-]

its 100% the correct take

fkyoureadthedoc 19 hours ago | parent [-]

yeah this is my personal experience. The new image generation is the only reason I keep an OpenAI subscription rather than switching to Google.