Remix.run Logo
minimaxir 20 hours ago

"Image from a reference" is a bit of a rabbit hole. For traditional image generation models, in order for it to learn a reference, you have to fine-tune it (LoRA) and/or use a conditioning model to constrain the output (InstantID/ControlNet)

The interesting part of this GPT-4o API is that it doesn't need to learn them. But given the cost of `high` quality image generation, it's much cheaper to train a LoRA for Flux 1.1 Pro and generate from that.

thot_experiment 20 hours ago | parent | next [-]

Reflux is fantastic for the basic reference image based editing most people are using this for, but 4o is far more powerful than any existing models because of it's large scale and cross-modal understanding, there are things possible with 4o that are just 100% impossible with diffusion models. (full glass of wine, horse riding an astronaut, room without pink elephants, etc)

Tiberium 20 hours ago | parent | prev [-]

Imagen supports image references in the API as well, just on Vertex, not on Gemini API yet.

BoorishBears 12 hours ago | parent [-]

Imagen references don't feel very useful at all. At most it feels like an afterthought meant to make product photoshoots easier.