Remix clone Hacker News

> a cute dog hugs a cute cat

This prompt is best served by Midjourney, Flux, Stable Diffusion. It'll be far cheaper, and chances are it'll also look a lot better.

The place where gpt-image-1 shines if if you want to do a prompt like:

"a cute dog hugs a cute cat, they're both standing on top of an algebra equation (y=\(2x^{2}-3x-2\)). Use the first reference image I uploaded as a source for the style of the dog. Same breed, same markings. The cat can contrast in fur color. Use the second reference image I uploaded as a guide for the background, but change the lighting to sunset. Also, solve the equation for x."

gpt-image-1 doesn't make the best images, and it isn't cheap, and it isn't fast, but it's incredibly -- almost insanely -- powerful. It feels like ComfyUI got packed up into an LLM and provided as a natural language service.

▲

stavros 14 hours ago | parent [-]

I wonder if we can use gpt-image-1 outputs, with some noise, as inputs to diffusion models, so GPT takes care of adherence and the diffusion model improves the quality. Does anyone know whether that's at all possible?

	▲	AuryGlenz 10 hours ago \| parent \| next [-]
		Sure. I suppose with API support 3 hours ago someone probably made a Comfy node all of 2 hours ago. From there you can either just do a low denoise or use one of the many IP-Adapter type things out there.
	▲	levzzz 12 hours ago \| parent \| prev [-]
		yes it's what a lot of people have been doing with newer models which have better prompt adherence, passing them through older models with better aesthetics